OpenAI's deep research: performance on expert-level tasks

Hey there!

Welcome back to The Pulse, where we dive into interesting AI stories and trends backed by data, all presented through simple visuals.

Deep research: ChatGPT’s new agentic capability that can do impressively comprehensive online research. According to OpenAI:

> its pass rates correlate more with task's economic value than time taken by humans

> tasks hard for the AI ≠ tasks time-consuming for humans

> only DeepSeek R1 shows comparable performance to OAI’s models

> o3 mini surpasses all models except o1 full, followed by DeepSeek R1

> reasoning models consistently superior across benchmarks

> study with 20 pro comedians (who already use AI) in a workshop

> using ChatGPT and Bard in August ‘23

> general sentiments:

- AI was overall not very helpful

- useful & faster than humans for first draft and structure

- content was bland, generic & unfunny

- jokes were of poor quality