Claude Opus 4.5's recent performance on benchmarks

November 25, 2025

Hey there!

Welcome back to The Pulse, where we dive into interesting AI stories and trends backed by data, all presented through simple visuals.

> deemed best model in the world for coding, agents & computer use

> top scores surpassing Gemini 3 on coding + key benchmarks; #2 on Artificial Analysis intelligence index

> highest SWE-bench Verified score yet

> most safe & aligned + most resistant to prompt injection among all major models

> new “effort” setting on API: matches Sonnet 4.5 using 76% fewer tokens, & exceeds it by 4.3 pts at high effort while using 48% fewer tokens

> priced at $5/M input & $25/M output (down from $15/$75)

> frontier agentic coding model built to handle real engineering tasks end-to-end

> first model that can work across multiple context windows via compaction → supports million-token coding & refactor sessions

> much lower costs: uses ~30% fewer thinking tokens than 5.1-Codex for same ‘medium’ accuracy, achieving high token-efficiency

> new xhigh mode thinks for much longer periods of time for improved answers

> runs 24-hour autonomous coding loops, fixing tests + iterating until task completion

> trained for Windows + improved CLI collaboration; safer sandboxed execution by default

> used by 95% of OpenAI engineers, who ship ~70% more pull requests since adopting Codex

> fully open 7B - 32B family releasing everything (data, scripts, checkpoints, logs + complete training info)

> matches top open models in performance (except Kimi K2 Thinking) while offering unmatched transparency

> strongest fully & truly open reasoning model w/ 65K-token context window (~50K words)

> 7B base trained on 6T tokens at ~7.7k tokens/s using ~220k H100-hours

> training costs: ~$500K (7B) & ~$2.2M (32B)

> rivals Qwen while using 6× fewer tokens

> OlmoTrace reveals exact training documents behind any output — first real transparency tool

> versions: