Part I: Why, and What's Different

Chapter 2: Two Tools, Compared — Philosophy and Interface of Claude Code and Codex

Written: 2026-04-28 Last updated: 2026-04-28

2.1 Same Problem, Two Answers

Start with a concrete task: refactor a 5,000-line Express.js service. Run it through both tools. What happens?

Two widely cited numbers from the developer community [contributor, 2026]:

Cost: An Express.js refactor case ran $155 on Claude Code vs $15 on Codex — Codex is roughly 10x cheaper (cost derived from token consumption).
Blind quality review: in a blind output-only comparison, reviewers preferred Claude Code 67% vs Codex 25% (8% tied; 500+ developer Reddit survey).

Claude Code costs more and produces better output. Codex costs less and produces slightly weaker output. But these numbers capture only one of seven dimensions — cost/quality tradeoff. The other six dimensions tell a more complex story.

The delivery channel distinction: Anthropic designed Claude Code as an external harness layer on top of the model API [Anthropic, 2026]. OpenAI shipped GPT-5.5 into Codex before the API opened [Willison, 2026]. Same word ("harness"), different architectures — Codex internalizes harness logic into the product surface, Claude Code externalizes it for the user to assemble.

Figure 2.2: Same word, different architectures — Claude Code's harness is a user-assembled layer above the API; Codex's harness is internalized into the product surface. GPT-5.5 shipping into Codex first was the sharpest expression of this gap. illustration by author Gemini assisted

2.2 Seven Dimensions

Dimension 1: Memory Model

Claude Code: CLAUDE.md hierarchy with auto-walk. Reads ~/.claude/CLAUDE.md (global) → project root CLAUDE.md → subdirectory .claude/CLAUDE.md automatically. Agent-driven memory persistence via .claude/memory/ folder [Anthropic, 2026].

Codex: AGENTS.md 3-tier discovery with override priority. Walks from current directory up toward root for AGENTS.md files; global config in ~/.codex/; subagent definitions in .codex/agents/.toml [OpenAI, 2026]. Override priority: more specific path wins.

Tradeoff: Claude's auto-walk is convenient but hard to audit. Codex's explicit 3-tier is more predictable but requires you to find files manually.

Dimension 2: Tool Model

Claude Code: subagents as functions. Define agents in .claude/agents/.md (markdown), invoke via invoke_subagent [Anthropic, 2026]. Hooks in .claude/hooks/ inject logic before/after tool execution.

Codex: agents as TOML files. Write name, description, developer_instructions in .codex/agents/.toml — Codex auto-discovers them [OpenAI, 2026]. Skills use SKILL.md frontmatter [OpenAI, 2026].

Tradeoff: Claude's markdown agents are human-friendly but loosely schemed. Codex's TOML is structured and portable — AGENTS.md is adopted by 60,000+ open-source projects as a cross-vendor standard [Foundation, 2026].

Dimension 3: Autonomy Model

Claude Code: harness orchestrates autonomy externally. Agent Teams [Anthropic, 2026] lets multiple agents collaborate via shared task lists; SendMessage enables peer-to-peer agent communication. Autonomy depth is determined by the user's harness design.

Codex: autonomy internalized via reasoning effort. One line in config.toml — model_reasoning_model_reasoning_effort = "xhigh" — adjusts the model's reasoning depth [OpenAI, 2026]. In GPT-5.5, xhigh effort uses ~240x more reasoning tokens than none [Willison, 2026].

Tradeoff: Claude's external orchestration gives fine-grained control but adds complexity. Codex's effort slider is simple but opaque — you can't observe the reasoning process directly.

Dimension 4: Edit Model

Claude Code: interactive REPL. Each terminal session maintains state; changes land in the filesystem immediately. /undo reverts the last change.

Codex: branch-per-task. Each task runs on an independent branch, completing as a PR [Proser, 2026]. Multiple tasks can run in parallel on separate branches.

Tradeoff: Claude's REPL is fast for exploration. Codex's branch-per-task is safer and trivially parallelizable — queue multiple tasks, merge results.

Dimension 5: Permissions Model

Claude Code: auto-mode + allowlists. Tools can prompt for user confirmation or auto-approve. Fine-grained per-tool control via allowedTools array in .claude/settings.json [Anthropic, 2026].

Codex: sandbox_mode trio [OpenAI, 2026]:

read-only: read-only access (most conservative)
workspace-write: writes allowed in current workspace (recommended default)
danger-full-access: unrestricted system access (use with caution)

approval_policy sets human-in-the-loop gates (untrusted / on-request / never).

Tradeoff: Claude's per-tool granularity vs Codex's sandbox-level selection. Codex's simplicity is friendlier for operational safety.

Dimension 6: Context Management

Claude Code: compaction + memory folders. When context grows long, Claude summarizes past content (compaction) to recycle the context window — Opus 4.6 achieved 84% on BrowseComp with compaction [Martin and Anthropic, 2026]. The memory folder pattern lets agents write and read context files.

Codex: progressive context loading. Skill YAML frontmatter in AGENTS.md loads brief overviews; the agent reads full content when needed [OpenAI, 2026]. Branch isolation means each task also starts with clean context.

Tradeoff: Claude's compaction excels for long sessions. Codex's progressive loading keeps initial context lean.

Dimension 7: Collaboration Model

Claude Code: Agent Teams + SendMessage [Anthropic, 2026]. TeamCreate creates a team; TaskCreate with addBlockedBy/addBlocks declares a dependency graph. Agents communicate directly via SendMessage.

Codex: TOML subagents + skills [OpenAI, 2026]. Define each agent as a TOML file in .codex/agents/; an orchestrator invokes them.

Tradeoff: Claude's Agent Teams are dynamic with rich runtime collaboration. Codex's TOML agents are static and declarative — better for tool composition than complex runtime messaging.

Figure 2.1: The seven dimensions — memory, tools, autonomy, edits, permissions, context, collaboration. Each row shows where Claude Code and Codex diverge in interface philosophy. illustration by author Gemini assisted

2.3 Why Every Dimension Is Asymmetric

The seven dimensions above aren't a feature checklist — they follow from a single architectural decision about where the harness lives.

Anthropic built Claude Code as a harness that lives in user-authored config: CLAUDE.md, .claude/agents/, .claude/hooks/, .claude/skills/. The model API is the foundation; the harness is the user's layer on top of it [Anthropic, 2026].

OpenAI shipped GPT-5.5 into Codex before the API opened [Willison, 2026]. The model and the harness arrived together, as a single product. The API was secondary. When the frontier model ships into the harness before the API, the harness has stopped being a wrapper — it has become the primary delivery channel.

That single decision cascades into every dimension above: memory is external vs. internal; collaboration is runtime-dynamic vs. TOML-declared; autonomy is user-orchestrated vs. effort-dialed; permissions are per-tool granular vs. sandbox-level. These aren't arbitrary differences — they each trace back to the same question: who owns the harness?

2.4 The Numbers

Metric	Claude Code	Codex	Source
Express.js refactor cost	$155	$15	[contributor, 2026]
Blind review preference	67%	25% (tie 8%)	[contributor, 2026]
Terminal-Bench score	65.4	77.3	[MorphLLM, 2026]
SWE-bench Verified (Sonnet 4.6)	79.6%	—	[Anthropic, 2026]
AGENTS.md adopting projects	N/A	60,000+	[Foundation, 2026]

Figure 2.3: Two faces of community data — Codex wins 10x on cost in the Express.js refactor case, but blind reviewers in a 500+ developer Reddit survey preferred Claude Code 67% to 25% with 8% tied. Benchmark winner is not the same as preferred output. illustration by author Gemini assisted

The takeaway: Codex wins on Terminal-Bench; Claude Code wins on blind review. Benchmark winner ≠ preferred output [MorphLLM, 2026]. DataCamp's comparison reaches the same conclusion [DataCamp, 2026]: "Claude for fast interactive coding; Codex for autonomous long-running tasks."

2.5 A Third Perspective

Blake Cros's analysis of the Chinese AI coding market [Crosley, 2026] frames the comparison as "interface philosophy": "Claude gives developers more control at the cost of more configuration. Codex takes control but provides a simpler starting point." The same tradeoff is legible across different markets.

2.6 Summary: Two Interfaces, Two Philosophies

Seven dimensions, one summary:

Claude Code: user assembles the harness externally. More control, more configuration.
Codex: harness is internalized into the product. Simpler start, more predictable autonomy.

Neither is "better." Chapter 3 covers your first Codex session. Starting in Chapter 4, you'll learn harness engineering patterns that work in both tools. These seven dimensions are the coordinate system for that learning.

References

MorphLLM, "Codex vs Claude Code Benchmark," 2026. [MorphLLM, 2026]
Anthropic, "Claude Code: Best practices for agentic coding," 2026. [Anthropic, 2026]
Anthropic, "Claude memory documentation," 2026. [Anthropic, 2026]
Anthropic, "Subagents," 2026. [Anthropic, 2026]
Anthropic, "Agent Teams," 2026. [Anthropic, 2026]
Anthropic, "Harnessing Claude's Intelligence," 2026. [Martin and Anthropic, 2026]
OpenAI, "AGENTS.md specification," 2026. [OpenAI, 2026]
OpenAI, "Codex config reference," 2026. [OpenAI, 2026]
OpenAI, "Codex subagents," 2026. [OpenAI, 2026]
OpenAI, "Codex skills," 2026. [OpenAI, 2026]
Willison, Simon, "GPT-5.5," simonwillison.net, 2026-04-23. [Willison, 2026]
AGENTS.md Open Standard, "60K+ projects adoption," 2026. [Foundation, 2026]
Anthropic, "Introducing Claude Sonnet 4.6," 2026-02-17. [Anthropic, 2026]
DataCamp, "Codex vs Claude Code," 2026. [DataCamp, 2026]
Blake Cros, "China AI coding market analysis," 2026. [Crosley, 2026]
Zack Proser, "Codex daily-use review," 2026. [Proser, 2026]
dev.to, "Claude Code vs Codex 2026 — What 500+ Reddit Developers Really Think," 2026. [contributor, 2026]