- The Pulse by 42neurons
- Posts
- Model performance across benchmarks
Model performance across benchmarks
Hey there!
Welcome back to The Pulse, where we dive into interesting AI stories and trends backed by data, all presented through simple visuals.

> in Claude models, parallel test time compute boosts SWE-bench +6.9pts, AIME +14.5pts, GPQA +3.7pts on avg
> new DeepSeek R1 0528 sees huge improvement from older version across benchmarks, now on par with SOTA models
> Epoch reported lower SWE-bench scores across the board:
Claude 4 Opus: 62%
Claude 4 Sonnet: 61%
DeepSeek R1 0528: 33%

> total among the top deals:
Foundational AI: $29B
AI Infrastructure: $11.1B
Autonomous Vehicles: $6.6B
Aerospace & Defense: $2.5B
> high model training costs consuming significant funding
> among consumer tech firms, only 25% use AI as key vertical but get 4x valuation premium (late-stage) & 2x (early-stage)

> in 2024:
61% of VC mega-deals to AI companies
48% of $204B total VC to AI-leveraged companies (horizontal, vertical, chips, etc.)
> first time in history AI mega-deals exceed non-AI mega-deals
> from 1 in 4 → 1 in 2 VC deals to AI companies recently
> non-AI investment flat; AI leading VC recovery
> similar growth to mobile tech boom + money concentrated in few companies, expected to spread to whole market