AMD Radeon RX 9060 XT: The $299 RDNA 4 GPU That Changes Budget 1080p Gaming

June 9, 2025

iRig Pro Quattro I/O Review: 4 XLR Inputs in a Sub-1-Pound Package That Changed My Mobile Rig

June 10, 2025

OpenAI o3 Gets 80% Cheaper and Launches o3-pro: What the Biggest API Price Drop of 2025 Means for Developers

Published by Sean Kim on June 10, 2025

The OpenAI o3 Price Cut: From $10 to $2 Per Million Tokens

OpenAI’s announcement was refreshingly straightforward: “We optimized our inference stack that serves o3. Same exact model — just cheaper.” No fine print, no capability trade-offs.

Here’s the before-and-after breakdown:

o3 Input (old): $10.00 per 1M tokens → (new): $2.00 per 1M tokens
o3 Output (old): $40.00 per 1M tokens → (new): $8.00 per 1M tokens
Cached Input: $0.50 per 1M tokens (new discount tier)
Effective reduction: 80% across the board

For a typical reasoning-heavy workload processing 10 million input tokens and 2 million output tokens per month, the cost drops from $180 to $36. That’s $144 in monthly savings per workflow — and for teams running dozens of reasoning pipelines, the cumulative impact is massive.

OpenAI o3 price cut announcement showing 80% reduction — OpenAI o3 pricing announcement (Source: OpenAI Developer Community)

o3-pro: Enterprise-Grade Reasoning at 87% Less Than o1-pro

Alongside the o3 price cut, OpenAI launched o3-pro — a new tier that uses significantly more compute than standard o3 to “think harder” on complex problems. It’s designed for scenarios where accuracy and reliability matter more than speed.

The pricing:

o3-pro Input: $20 per 1M tokens
o3-pro Output: $80 per 1M tokens
vs o1-pro: 87% cheaper for the replacement model

OpenAI recommends running o3-pro in background mode for long-running asynchronous tasks to prevent timeouts. It’s available immediately in the API and in ChatGPT’s model picker for Pro and Team subscribers.

Developer Impact: Who Benefits Most From the OpenAI o3 Price Cut?

This isn’t just a pricing announcement — it reshapes which workloads are economically viable with reasoning models. Three categories of developers stand to benefit immediately:

1. Multi-Agent Orchestration Builders

If you’re running agentic workflows where multiple AI calls chain together — code generation, analysis, validation loops — the per-call cost was previously prohibitive with o3. At $2/$8, you can now afford to let reasoning models handle intermediate steps that used to require cheaper but less capable models like GPT-4o mini ($0.15/$0.60).

2. RAG Pipeline Operators

The cached input discount at $0.50 per million tokens is a game-changer for Retrieval-Augmented Generation. If you’re feeding the same context documents repeatedly — legal contracts, codebases, knowledge bases — the effective cost per query drops dramatically. Combined with the Batch API’s additional 50% savings, high-volume RAG deployments become genuinely affordable.

3. Startups Choosing Between Reasoning and Budget

Previously, startups had to choose: use o3 for quality and blow the budget, or stick with GPT-4o mini for cost efficiency and accept lower reasoning capability. At $2/$8, o3 sits in a sweet spot — roughly 13x more expensive than GPT-4o mini, but now within reach for critical reasoning tasks while delegating routine work to cheaper models.

OpenAI o3 API pricing comparison chart — OpenAI o3 API pricing comparison (Source: Apidog)

How o3’s New Pricing Compares to the Competition

As of June 2025, here’s how o3’s new pricing stacks up against other major API providers’ reasoning-capable models:

OpenAI o3 (new): $2/$8 per 1M tokens — 80% cheaper than before
Anthropic Claude Sonnet 4: $3/$15 per 1M tokens — o3 now undercuts on input
Google Gemini 2.5 Pro: $1.25/$10 per 1M tokens — still cheaper on input, pricier on output
OpenAI GPT-4o: $2.50/$10 per 1M tokens — o3 is now cheaper than GPT-4o for reasoning tasks
DeepSeek R1: $0.55/$2.19 per 1M tokens — still the budget king for reasoning

The most striking comparison: o3 is now cheaper than GPT-4o on a per-token basis, despite being a more capable reasoning model. This pricing inversion suggests OpenAI is aggressively positioning o3 as the default choice for any task requiring multi-step thinking.

The Cached Input Strategy: Hidden Savings Most Developers Miss

Beyond the headline 80% cut, the $0.50 cached input rate deserves special attention. If your application sends identical prefixes — system prompts, few-shot examples, or shared context documents — across multiple requests, the effective input cost drops to a quarter of the already-reduced rate.

For a legal document analysis pipeline processing 100 queries against the same 50,000-token contract, the cached savings alone reduce input costs from $10 (at the old rate) to $2.75 at the new rate with caching. That’s a 72.5% additional reduction on top of the 80% price cut for repeated context.

OpenAI also confirmed that the Batch API remains available for o3 at an additional 50% discount on both cached inputs and outputs. Enterprises processing large token volumes overnight can stack these discounts for maximum savings.

What This Means for the AI Pricing Landscape

OpenAI’s move isn’t happening in a vacuum. Anthropic recently pushed Claude Sonnet 4 pricing lower, and Google continues to aggressively price Gemini 2.5 Pro. The broader trend is clear: reasoning-capable models are moving from premium tier to commodity pricing faster than anyone predicted.

For developers, this means the decision framework is shifting. The question is no longer “Can I afford reasoning models?” but “Which reasoning model gives me the best quality per dollar for my specific use case?” With o3 at $2/$8, the answer increasingly includes OpenAI’s reasoning tier for tasks that were previously cost-prohibitive.

The fact that OpenAI achieved this through inference optimization — not model compression or capability reduction — signals that further price drops are likely as they continue to optimize their serving infrastructure. If you’re building AI products in 2025, plan for a world where reasoning costs continue to fall.

Building AI products and need help optimizing API costs, selecting models, or architecting multi-agent systems? Let’s talk strategy.

Get Tech Consultation →

Learn More About Sean Kim

Get weekly AI, music, and tech trends delivered to your inbox.

Sean Kim

Comments are closed.

AMD Radeon RX 9060 XT: The $299 RDNA 4 GPU That Changes Budget 1080p Gaming

iRig Pro Quattro I/O Review: 4 XLR Inputs in a Sub-1-Pound Package That Changed My Mobile Rig

AMD Radeon RX 9060 XT: The $299 RDNA 4 GPU That Changes Budget 1080p Gaming

iRig Pro Quattro I/O Review: 4 XLR Inputs in a Sub-1-Pound Package That Changed My Mobile Rig

The OpenAI o3 Price Cut: From $10 to $2 Per Million Tokens

o3-pro: Enterprise-Grade Reasoning at 87% Less Than o1-pro

Developer Impact: Who Benefits Most From the OpenAI o3 Price Cut?

1. Multi-Agent Orchestration Builders

2. RAG Pipeline Operators

3. Startups Choosing Between Reasoning and Budget

How o3’s New Pricing Compares to the Competition

The Cached Input Strategy: Hidden Savings Most Developers Miss

What This Means for the AI Pricing Landscape

Microsoft MAI Models Drop 3 at Once — MAI-Transcribe-1 Destroys Whisper Across All 25 Languages

Microsoft Zero Trust for AI: 700 Security Controls Every Enterprise Needs Before Deploying AI Agents

Mistral Small 4 Review: How the 119B MoE Open-Source Model Matches GPT-OSS 120B at 40% Lower Latency