Claude 4 Opus leads in IQ score

Hey there!

Welcome back to The Pulse, where we dive into interesting AI stories and trends backed by data, all presented through simple visuals.

> Anthropic released Claude 4 models recently

> dominates coding (SWE-bench, Terminal-bench) and matches/beats rivals on other benchmarks (MMMU, MMLU, GPQA Diamond)

> superior context retention, instruction-following, and integrates live web search during reasoning

> safety tests: 84% blackmail rate of engineers using personal info + attempted model weight theft when believing shutdown imminent

> frontier models top Aider benchmark

> updated Gemini 2.5 Pro with Deep Think sees massive improvement + offers best price-performance ratio along with oAI o4-mini

> Devstral and new Sonnet 4 underperform vs predecessors - suggesting new focus on training for autonomous agentic use, which Aider does not reflect well

> Pentagon expands Palantir's Project Maven AI contract to $1.3B ceiling through 2029

> increased 166% from original contract - which was $480M in May 2024

> project creates AI-powered "kill chain" for military targeting and firing decisions

> added $795M in new funding due to surging AI military demand, with NATO also signing with Palantir

> expected to make ~$200M in revenue this year from the program