Model performance across benchmarks

Hey there!

Welcome back to The Pulse, where we dive into interesting AI stories and trends backed by data, all presented through simple visuals.

> in Claude models, parallel test time compute boosts SWE-bench +6.9pts, AIME +14.5pts, GPQA +3.7pts on avg

> new DeepSeek R1 0528 sees huge improvement from older version across benchmarks, now on par with SOTA models

> Epoch reported lower SWE-bench scores across the board:

  • Claude 4 Opus: 62%

  • Claude 4 Sonnet: 61%

  • DeepSeek R1 0528: 33%

> total among the top deals:

  • Foundational AI: $29B

  • AI Infrastructure: $11.1B

  • Autonomous Vehicles: $6.6B

  • Aerospace & Defense: $2.5B

> high model training costs consuming significant funding

> among consumer tech firms, only 25% use AI as key vertical but get 4x valuation premium (late-stage) & 2x (early-stage)

> in 2024:

  • 61% of VC mega-deals to AI companies

  • 48% of $204B total VC to AI-leveraged companies (horizontal, vertical, chips, etc.)

> first time in history AI mega-deals exceed non-AI mega-deals

> from 1 in 4 → 1 in 2 VC deals to AI companies recently

> non-AI investment flat; AI leading VC recovery

> similar growth to mobile tech boom + money concentrated in few companies, expected to spread to whole market