Price vs performance of non-reasoning LLMs on LiveBench

Hey there!

Welcome back to The Pulse, where we dive into interesting AI stories and trends backed by data, all presented through simple visuals.

> LiveBench: 17 tasks testing AI across reasoning, coding, math, & comprehension

> red line: models at the pareto front, where no other model exists both cheaper & better

> google models dominate: best performance for price compared to expensive oAI models

> benchmark currently dominated by reasoning models, with o3 scoring highest

> almost 1 in 4 US tech jobs in Jan were AI-related

> in Jan, AI-related jobs: 1.3% of all job posts; tech jobs: 5.4% of all openings

> since ChatGPT launch (Q4 2022) to Q4 2024:

  • AI-related job posts up 68%

  • tech job posts down 27%

> many industries want tech staff who can build or use AI: finance, consulting, retail, pharma, etc.

> 36% of all IT job posts in Jan were also AI-related

> AI chatbot downloads up 119% YoY in Q4 2024; AI Art Generators up 21% YoY

> ChatGPT dominated market: ~40% of global GenAI app consumer spend & 23% downloads in 2024

> 16 GenAI apps earned $10M+ in IAP revenue in 2024; 25 apps reached >10M downloads

> other popular categories: photo editing/enhancement, beauty editors, video editing & generation, etc.