OpenAI's deep research: performance on expert-level tasks

February 11, 2025

Hey there!

Welcome back to The Pulse, where we dive into interesting AI stories and trends backed by data, all presented through simple visuals.

Deep research: ChatGPT’s new agentic capability that can do impressively comprehensive online research. According to OpenAI:

> its pass rates correlate more with task's economic value than time taken by humans

> tasks hard for the AI ≠ tasks time-consuming for humans

> only DeepSeek R1 shows comparable performance to OAI’s models

> o3 mini surpasses all models except o1 full, followed by DeepSeek R1

> reasoning models consistently superior across benchmarks

> study with 20 pro comedians (who already use AI) in a workshop

> using ChatGPT and Bard in August ‘23

> general sentiments:

- AI was overall not very helpful

- useful & faster than humans for first draft and structure

- content was bland, generic & unfunny

- jokes were of poor quality