AI’s scores on Humanity’s Last Exam over time

June 24, 2025

Hey there!

Welcome back to The Pulse, where we dive into interesting AI stories and trends backed by data, all presented through simple visuals.

> Moonshot AI's Kimi Researcher tops Humanity's Last Exam, beating all other deep research tools & models

> 26.9% pass@1 score (vs 8.6% initial) through end-to-end RL training

> frontier last pushed by OpenAI Deep Research in Feb; claims Gemini also at 26.9% but unconfirmed

> uses parallel search, web browser, and coding tools for autonomous multi-step problem solving

> averages 23 reasoning steps and explores 200+ URLs per task

> Meta on AI spending spree:

expanded capex spend range
attempted to takeover Runway AI, Safe Superintelligence ($32B AI startup) and Perplexity AI
invested $14.3B in Scale AI, acquiring a 49% stake
hired founder Alexandr Wang + team of their top employees
hiring Nat Friedman (former GitHub CEO) and Daniel Gross (CEO of Safe Superintelligence)
tried poaching OpenAI employees through $100M signing bonuses + larger annual compensation packages

> Epoch's GATE model: front-loaded AI investment & huge spend precedes economic payoff

> current $500B commitments far below optimal $25T for 2025 - investments too conservative, not bubble

> full automation within 15-20 years, acceleration starting 2028-2030

> huge upfront compute investments justified by $50T global labor compensation capturable through full automation

> AI compute investments could hit >10% of global GDP (~50x current levels)

> economic growth speeds up dramatically: 20%+ at just 30% task automation

> peak automation delivers 30-100%+ annual growth vs historical 3%