
Synthesizer V AI Choir Guide: How Dreamtonics Is Redefining Vocal Production in 2026
March 26, 2026
VS Code Goes Weekly: How Microsoft’s Shift from Monthly to Weekly Releases Changes Extension Development and AI Tooling
March 26, 2026On March 14, OpenAI shipped Codex subagents to general availability — and the implications for how we write software are massive. OpenAI Codex subagents let a single developer run up to 8 AI agents simultaneously, writing code, running tests, and reviewing changes in parallel. The era of one-at-a-time AI coding assistance is officially over. After a preview period behind feature flags in Q1 2026, this system is now available to all Codex users, and multiple Y Combinator-backed startups are already running overnight autonomous development loops with it.
What OpenAI Codex Subagents Actually Are
OpenAI Codex subagents introduce a multi-agent architecture where a manager agent orchestrates several specialized agents working concurrently on the same codebase. Instead of a single AI assistant processing tasks sequentially, the system decomposes complex work into discrete units and distributes them across purpose-built agents. The manager breaks down the task, assigns each piece to the appropriate subagent, waits for all results, and consolidates everything into a single coherent response.
According to the official OpenAI developer documentation, three default agents ship out of the box.
- Explorer — A read-only codebase exploration agent that maps the code owning a failing UI flow, identifies relevant frontend and backend code paths, entry points, and state transitions before any code gets modified. Think of it as the scout that creates the battle plan
- Worker — An implementation-focused agent for small, targeted fixes after the issue is fully understood. This is the agent that actually writes and modifies code based on Explorer’s findings
- Default — A general-purpose fallback agent that handles tasks not assigned to a specialized role, providing flexibility for miscellaneous work
The critical innovation is true parallelism. While Explorer analyzes your codebase structure and maps dependencies, Worker can simultaneously modify files in already-understood areas, and a custom test agent can write unit tests in a separate thread. Codex handles all orchestration automatically — spawning new subagents, routing follow-up instructions, waiting for results, and closing agent threads when complete. The default concurrent thread limit is 6, expandable up to 8 parallel subagents. Task nesting depth defaults to 1 level but can be reconfigured for more complex workflows.

Custom TOML Agents: Building Your Own Specialized Coding Team
The real power of OpenAI Codex subagents lies in custom agent definitions. Drop a TOML file into ~/.codex/agents/ for personal agents or .codex/agents/ for project-scoped agents shared across your entire team. If a custom agent name matches a built-in agent like explorer, your custom definition takes precedence — meaning you can override default behavior to match your team’s specific needs and conventions.
Each TOML file defines the agent’s name, description, and developer_instructions as required fields. Optional configuration includes model selection (including GPT-5.3-codex-spark for specialized coding tasks), model_reasoning_effort for controlling inference depth, sandbox_mode for security isolation, MCP server connections for external tool access, and skills.config for specialized capabilities. This granularity means you can create agents tuned for completely different tasks — one running a heavyweight model for architectural analysis, another using a fast lightweight model for formatting checks.
The official documentation showcases a PR review pattern that demonstrates the practical value of custom agents. Three specialized agents — pr_explorer for read-only codebase mapping and evidence gathering, reviewer for analyzing correctness, security vulnerabilities, and test risk assessment, and docs_researcher for verifying framework and API documentation through a dedicated MCP server — run simultaneously on every pull request. What previously required a human reviewer spending 30-45 minutes now completes in under 2 minutes with comprehensive coverage across code quality, security, and documentation accuracy.
There is also an experimental CSV batch processing feature that deserves attention. The spawn_agents_on_csv tool creates one worker agent per CSV row, enabling automated processing of migration tasks, bulk refactoring, or repetitive codebase modifications at scale. Each worker has a configurable job_max_runtime_seconds timeout to prevent runaway processes. For teams managing large monorepos with hundreds of microservices, this feature alone could save dozens of engineering hours per sprint.
Real-World Performance: What the Numbers Actually Show
Early post-GA data from production teams presents a nuanced picture of subagent capabilities. Well-defined tasks — those with clear specifications, bounded scope, and established patterns — hit a 75-85% acceptance rate. Medium-complexity work requiring some judgment calls lands at 50-65%. Genuinely novel tasks involving new architectures or unfamiliar patterns sit at 15-30%. These numbers tell a clear story: subagents excel at structured, repeatable work and still require human oversight for creative or ambiguous problems.
The cost structure is compelling. Per-task costs range from $3-8 for well-scoped work, factoring in human review time. Compare this to Devin’s $500/month flat rate or Cursor’s $20-40/month subscription, and Codex’s usage-based pricing offers significantly more flexibility. You pay for what you use, which means light weeks cost less and heavy sprint weeks scale appropriately.
Test coverage improvements have been particularly striking in early adoption reports. Teams running subagents against previously untested modules report 60-80% coverage improvements from a single overnight run. Multiple YC-backed startups have built workflows where subagents process a queue of testing and implementation tasks overnight, producing pull requests that human engineers review each morning. The overnight development loop is becoming a genuine pattern in 2026 startup engineering culture.
On Terminal-Bench 2.0, GPT-5.3-Codex scores 77.3%, establishing clear dominance in terminal-native tasks including DevOps automation, shell scripting, and CLI tool development. For infrastructure teams where the majority of daily work happens in the terminal, this benchmark advantage translates directly into measurable productivity gains. The combination of high terminal performance with parallel subagent execution makes Codex especially potent for platform engineering workflows.
Codex Subagents vs Claude Code: Different Philosophies for Different Developers
Comparing OpenAI Codex subagents with Claude Code reveals fundamentally different approaches to AI-assisted development rather than a simple “which is better” answer. OpenAI Codex operates as a standalone platform emphasizing cloud-sandboxed, asynchronous task delegation. The intended workflow is delegation: you define tasks, assign them to agents, go do other work, and collect results when they are ready. Claude Code takes the opposite approach, emphasizing a developer-in-the-loop local workflow where the AI works alongside you in the terminal, with real-time visibility into what it is doing and the ability to redirect at any moment.
VS Code marketplace data reveals a fascinating competitive dynamic. Despite being published roughly three months later, Claude Code for VS Code leads with 5.2 million installs versus Codex’s 4.9 million, and holds a higher user rating at 4.0 versus 3.4 on a 5-point scale. Users consistently praise Claude Code’s handling of complex, multi-step tasks requiring deep contextual understanding of interconnected codebases, while Codex earns recognition for raw code generation speed, parallel execution capability, and cost efficiency at scale.
The cost differential between the two platforms is significant and worth calculating for your specific usage patterns. GPT-5 runs at roughly half the price of Anthropic’s Sonnet and approximately one-tenth the cost of Opus per token. For high-volume code generation workflows — think generating boilerplate across dozens of services, writing comprehensive test suites, or processing batch migrations — Codex’s pricing advantage compounds quickly. However, for complex refactoring where getting it right the first time saves expensive rework, architecture design requiring deep understanding of system constraints, and tasks where contextual reasoning accuracy matters more than raw generation speed, field reports from engineering teams consistently favor Claude Code.

Practical Selection Guide: Matching the Tool to Your Workflow
The decision ultimately comes down to how you and your team work daily. If your workflow centers on DevOps, infrastructure automation, scripting, and CLI tool development, Codex’s 77.3% Terminal-Bench score and parallel subagent execution make it the clear winner. The ability to spawn 8 agents working simultaneously on CI/CD pipeline construction, large-scale migration projects, or test generation across an entire service catalog is genuinely transformative for platform teams. CSV batch processing adds another dimension — processing hundreds of files through worker agents overnight is something no other tool currently matches.
If your daily work involves navigating complex full-stack applications, implementing nuanced business logic, performing deep refactoring across tightly coupled codebases, or making architectural decisions that require understanding subtle interactions between components, Claude Code’s contextual reasoning gives it a meaningful edge. The developer-in-the-loop model also matters for teams with strict compliance requirements where every AI-generated change needs real-time human oversight.
A growing number of experienced developers are adopting a hybrid strategy that leverages both tools’ strengths. The pattern looks like this: use Codex subagents for boilerplate generation, test coverage expansion, and batch processing tasks where parallelism and speed matter most, then switch to Claude Code for core business logic, complex debugging sessions, and architectural refactoring where depth of understanding trumps generation speed. This is not hedging — it is optimization.
The AI coding agent market in 2026 is transitioning from “one tool to rule them all” to “the right tool for the right job.” OpenAI Codex subagents reaching general availability is a declaration that multi-agent parallel coding has graduated from experimental feature to production-grade infrastructure. The pattern is now industry standard across major AI labs: automate the repetitive, structured work with specialized agents, and let human developers focus their cognitive energy on design, architecture, and decision-making. That is the direction OpenAI Codex subagents are pointing toward — and based on adoption velocity, the acceleration is only beginning.
Need help implementing AI coding agents or building development workflow automation? Let’s talk tech consulting.
Get weekly AI, music, and tech trends delivered to your inbox.



