The Pulse by 42neurons
Posts
ARC-AGI scores of OpenAI's models, with a spotlight on o3's performance

ARC-AGI scores of OpenAI's models, with a spotlight on o3's performance

January 15, 2025

Hey there!

Welcome to the first edition of The Pulse, where we dive into interesting AI stories and trends backed by data, all presented through simple visuals.

> OAI’s o3 crushes ARC-AGI benchmark (but still fails at some simple tasks)

> cost them ~$350K to run these evals

> astonishing results on Codeforces and FrontierMath

> surpasses all older models across benchmarks & supposedly smarter than the avg intelligent human

> 27 years ago: Deep Blue beat Kasparov at chess

> 17 years ago: Computers surpassed top human chess ratings

> AI has since beaten pros in many games: Go, Jeopardy!, Backgammon, Poker, Dota 2, and more…

> large-scale models: trained on over 10²³ FLOP; unconfirmed likely the same

> DeepMind kicked it off in 2017 with: AlphaGo Master & AlphaGo Zero

> who's making them:

Google & Meta lead, followed by OAI and Anthropic
mostly USA (53%), then China (23%)
industry (88%) significantly leading over academia and collab