- The Pulse by 42neurons
- Posts
- Claude 4 Opus leads in IQ score
Claude 4 Opus leads in IQ score
Hey there!
Welcome back to The Pulse, where we dive into interesting AI stories and trends backed by data, all presented through simple visuals.

> Anthropic released Claude 4 models recently
> dominates coding (SWE-bench, Terminal-bench) and matches/beats rivals on other benchmarks (MMMU, MMLU, GPQA Diamond)
> superior context retention, instruction-following, and integrates live web search during reasoning
> safety tests: 84% blackmail rate of engineers using personal info + attempted model weight theft when believing shutdown imminent

> frontier models top Aider benchmark
> updated Gemini 2.5 Pro with Deep Think sees massive improvement + offers best price-performance ratio along with oAI o4-mini
> Devstral and new Sonnet 4 underperform vs predecessors - suggesting new focus on training for autonomous agentic use, which Aider does not reflect well

> Pentagon expands Palantir's Project Maven AI contract to $1.3B ceiling through 2029
> increased 166% from original contract - which was $480M in May 2024
> project creates AI-powered "kill chain" for military targeting and firing decisions
> added $795M in new funding due to surging AI military demand, with NATO also signing with Palantir
> expected to make ~$200M in revenue this year from the program