Curiosity · AI Model

OpenAI o3

OpenAI o3 is the April 2025 reasoning model that pushed the "thinking-model" category to a new plateau. Unlike o1, o3 uses tools inside its chain of thought — running Python, browsing, and analysing images as part of reasoning — and posts dramatic gains on ARC-AGI, SWE-bench, and graduate-level science evaluations.

Model specs

Vendor: OpenAI
Family: o-series
Released: 2025-04
Context window: 200,000 tokens
Modalities: text, vision, code
Input price: $10/M tok
Output price: $40/M tok
Pricing as of: 2026-04-20

Strengths

Tool use inside chain of thought — dramatic lift on real-world tasks
Strong vision reasoning, not just captioning
Large gains on ARC-AGI vs o1, signalling genuine generalisation
Good calibration — knows when to think longer

Limitations

Still slower and pricier than GPT-5 mini for equivalent production work
High reasoning-token consumption inflates real cost
No audio support — pair with Realtime for voice
Sometimes over-reasons on simple prompts, wasting tokens

Use cases

Complex coding agents — Cursor, Devin-style autonomous work
Graduate-level scientific research assistance
Data analysis agents that need to run code mid-reasoning
Multi-step web research with image understanding

Benchmarks

Benchmark	Score	As of
SWE-bench Verified	≈69%	2025-04
ARC-AGI (public)	≈75%	2025-04
GPQA Diamond	≈87%	2025-04

Frequently asked questions

What is OpenAI o3?

OpenAI o3 is the April 2025 reasoning model that succeeded o1. It combines long internal reasoning with tool use — running Python, browsing the web, analysing images — during chain of thought, producing much stronger scores on coding, science, and abstract reasoning benchmarks.

How is o3 different from o1?

The two biggest differences are tool use (o3 can run code and browse during reasoning; o1 could not) and vision (o3 accepts images; o1 was text-only). o3 also posts significantly higher scores across nearly every benchmark.

How much does o3 cost?

As of April 2026, o3 is priced at roughly USD 10 per million input tokens and USD 40 per million output tokens. Reasoning tokens count as output, so real costs are often higher than the headline price suggests.

Should I use o3 or GPT-5 in 2026?

GPT-5 is generally recommended for new builds because its unified router handles easy and hard prompts from one model. Choose o3 when you want deterministic reasoning behaviour without GPT-5's router, or when o3 is available on a platform GPT-5 is not.

Sources

OpenAI — Introducing o3 and o4-mini — accessed 2026-04-20
OpenAI — Pricing — accessed 2026-04-20