- The Pulse by 42neurons
- Posts
- Gemini 3's performance on benchmarks
Gemini 3's performance on benchmarks
Hey there!
Welcome back to The Pulse, where we dive into interesting AI stories and trends backed by data, all presented through simple visuals.

> massive gains in performance across benchmarks, almost unanimously considered best model now
> ranking #1 on many aggregate benchmarks (LMarena, Artificial Analysis, ARC AGI 2, etc.)
> announced usage scale:
AI Overviews used by 2B people per month
Gemini app at ~650M monthly active users
most Google Cloud customers (≈70%) now using Google’s AI tools
13M+ devs have built with Gemini
> highest scores on SimpleQA, MMMU, & Video-MMMU; core to real-world reliability on factual accuracy + multimodal reasoning + video understanding
> Deep Think mode gives big jumps on hardest reasoning tasks: 93.8% GPQA, 45.1% ARC-AGI-2, 41% HLE
> live today across Gemini API, Vertex AI, Workspace & Android
> announced Antigravity = Google’s agent platform (planning, tools, memory, full task automation)

> founded in 2022, already scaled to 300 employees
> 3x valuation & 2x ARR in 6 months
> funded by Accel, Thrive Capital, a16z, DST, Nvidia, Google, Coatue, etc.
> fastest ARR scale vs. competitors + most-funded AI-native coding startup ($3.3B total)
> over $1B ARR, in comparison to OpenAI's & Anthropic's 2025 aim of $20B & $9B

> fastest company to $1B ARR across AI coding, generic dev-tools & even huge tech firms
> as of May, majority revenue from subscriptions, but enterprise revenue grew 100x YTD 2025 (Business Wire)
> 84% devs using AI-coding tools (51% daily) this year as per Stack Overflow, with Cursor ranking 3rd most popular tool (Business Insider)
> Cursor tool reached $1B in 2 yrs since launch - still not as fast as ChatGPT (<1 yr)