The Pulse by 42neurons
Posts
Claude 4 Opus leads in IQ score

Claude 4 Opus leads in IQ score

May 27, 2025

Hey there!

Welcome back to The Pulse, where we dive into interesting AI stories and trends backed by data, all presented through simple visuals.

> Anthropic released Claude 4 models recently

> dominates coding (SWE-bench, Terminal-bench) and matches/beats rivals on other benchmarks (MMMU, MMLU, GPQA Diamond)

> superior context retention, instruction-following, and integrates live web search during reasoning

> safety tests: 84% blackmail rate of engineers using personal info + attempted model weight theft when believing shutdown imminent

> frontier models top Aider benchmark

> updated Gemini 2.5 Pro with Deep Think sees massive improvement + offers best price-performance ratio along with oAI o4-mini

> Devstral and new Sonnet 4 underperform vs predecessors - suggesting new focus on training for autonomous agentic use, which Aider does not reflect well

> Pentagon expands Palantir's Project Maven AI contract to $1.3B ceiling through 2029

> increased 166% from original contract - which was $480M in May 2024

> project creates AI-powered "kill chain" for military targeting and firing decisions

> added $795M in new funding due to surging AI military demand, with NATO also signing with Palantir

> expected to make ~$200M in revenue this year from the program