Claude Code vs. Codex: The Architecture Reveals the Bets Behind the Harnesses

Part 4 of 5 | Series: What We Learned from the Claude Code Leak

As I continue on my personal journey to learn more about Agent harnesses are architected, the Claude Code leak gave us something the AI industry rarely provides: a genuine apples-to-apples architectural comparison between the two leading AI platforms.

OpenAI’s Codex source has been public for a while. Claude Code’s source just became public by accident. For the first time, you can read both and compare the actual engineering decisions, not the marketing on LinkedIn and Tweets.

The differences are not minor. They reflect different perspectives and bets about how these systems work.

I keep on saying that we spend too much time talking about models in the abstract and not enough time talking about the execution architecture those models sit inside. Once you are trying to build around agents seriously, these architectural choices turn into organizational choices very quickly.

The Terminal UI: React vs. Rust#

I will start with the most visible difference. Claude Code’s terminal interface is built with Ink, which is React for the terminal. The .tsx file extension on the CLI entrypoint is not a typo. Message bubbles, tool call displays, permission prompts, the markdown renderer: all React components. Same mental model as a web app, with the terminal as the rendering target. This was really cool, everyone remembers how cool it was the first time we started to use Claude Code on the terminal.

Codex went the opposite direction. Their TUI is built in Rust with ratatui and crossterm. No JavaScript in the rendering path.

The Rust implementation has lower memory overhead and faster rendering. The React implementation means every engineer who builds Anthropic’s web products can contribute to the Claude Code CLI without context-switching. Same component model, same tooling, same mental model, I personally like the Claude Code approach way more.

This is not really just a technical decision. It is an organizational one. Anthropic bet on team velocity and shared knowledge. OpenAI bet on runtime performance. Both are defensible and there is no wrong answer here, but they show a peak at the companies’ internal engineering cultures.

Context Management: Defensive vs. Efficient#

This is where the architectural bets diverge most sharply.

Claude Code re-sends the full system prompt on every turn. To make this affordable, it relies on a prompt cache split at __SYSTEM_PROMPT_DYNAMIC_BOUNDARY__: the static half is cached globally, and only the per-session context changes. The cost is amortized through caching, not avoided through minimization.

Codex takes the opposite approach. It tracks a reference_context_item that snapshots settings between turns, then diffs against it to send only what changed. Smaller prompts per turn. Less caching complexity. More token-efficient on a per-call basis.

The same tradeoff shows up in compaction strategy. Claude Code has four distinct compaction strategies, a layered defense against every failure mode. They cover proactive compaction before the limit is hit, reactive compaction when the API rejects an over-long prompt, snip compaction for SDK and headless sessions, and context collapse for compressing old tool results without triggering a full compaction. Codex has three of its own: pre-turn compaction before the API call, mid-turn compaction while processing, and a compact_remote variant that delegates summarization to a remote service.

Claude Code’s approach is more defensive. It handles more failure modes with more fallbacks. Codex’s approach is more efficient. It pays less per turn under normal conditions.

Which is right? Probably both, for different contexts. Claude Code optimizes for reliability in interactive use. Codex optimizes for token efficiency in automated pipelines. The question is which context you are building for.

Anthropic bet:                 OpenAI bet:

larger harness                 leaner systems core
more defensive fallbacks       more protocol efficiency
cache the big prompt           diff what changed
shared web-style tooling       Rust-native runtime discipline
interactive reliability        automation efficiency

Session Persistence: Append-Only vs. Queryable#

When a session ends, both tools save something. But what they save is designed very differently.

Claude Code persists JSON transcripts with an asymmetric write discipline: user messages are written synchronously (so the session can always be resumed from the point the user’s input was accepted), while assistant messages are written asynchronously (they are already durable in the API response and can be replayed if needed). There is a comment in the source explaining this: someone lost a session because the process died between the user hitting enter and the API responding. The asymmetry is the fix.

Codex stores sessions as JSONL files with a session_log.rs module and a resume_picker.rs for selecting sessions to resume. More importantly, it also has a SQLite-backed state system in codex-rs/state/ for thread metadata, agent jobs, and memories. The session data is queryable. You can run structured queries against your agent history.

Claude Code’s transcripts are files you can search. Codex’s state is a database you can query. This is a meaningful difference for anyone building on top of these systems. The Codex architecture is clearly aimed at a world where you want to ask questions like “what were all the file edits this agent made in the last week” or “show me every task where the agent requested a permission escalation.”

Process Management: How Each Tool Handles Subprocesses#

Both Codex and Claude Code spawn subprocesses to run tools like shell commands, so neither is purely in-process. The interesting difference is in how each one manages those children.

On Linux, Codex sets a parent death signal via prctl(2) using the PR_SET_PDEATHSIG option on spawned children. If the main Codex process dies, the kernel sends a signal to the child automatically, which helps prevent orphaned processes from lingering.

Claude Code spawns subprocesses too. The Bash tool, for example, launches an actual /bin/bash process and communicates with it over stdin and stdout (see this GitHub issue for a real example of how its subprocess spawn path works). It also spawns processes for things like MCP servers running over stdio and background jobs. Cleanup of these children is handled at the application layer in Node, rather than through a kernel-level parent death signal.

The practical takeaway is that both tools have to deal with process lifecycle, just at different layers. Codex leans on a Linux specific kernel feature for guaranteed teardown. Claude Code relies on its Node runtime to track and clean up children, which is portable across operating systems but depends on the application exiting cleanly. Each approach has different failure modes worth understanding before drawing conclusions about which is more rigorous.

Competition In The System Design#

Here is the thing both codebases make clear: the model is only one part of the system, and often not the part that explains the behavior users actually feel.

Anthropic A/B tests prompt wording across model launches the way ad networks test copy. @[MODEL LAUNCH] annotations appear throughout the Claude Code source, tracking how specific prompt changes affect behavior. They are not waiting for a better model to fix behavior issues. They are running controlled experiments on the prompt layer, measuring outcomes, and shipping changes.

This is a product development process applied to AI behavior. And it means the quality of the agent is not just a function of which model runs underneath. It is a function of how well you understand the interaction between the prompt, the model, and the harness.

The practical advantage is in the years of production data, the compaction strategies that were each built in response to a real failure, the prompt tuning experiments that shipped and the ones that did not. That knowledge does not transfer when someone forks your code.

That is also why I think engineering orgs need to be careful copying surface patterns from other teams. The visible interface is usually the least interesting part. The hidden operating assumptions are the real product.

What the Race Looks Like From Here#

The two codebases together paint a picture of where agentic infrastructure is heading.

Persistent sessions will become the default. Both tools are moving toward agents that survive across sessions, remember context, and resume work without a cold start.

Multi-agent coordination is an assumed primitive, not a stretch goal. The coordinator/worker architecture in Claude Code’s unreleased features is not exotic. It is the natural next step once single-agent systems hit their limits.

The terminal UI will get absorbed. Voice mode, browser control via Playwright, background daemons with push notifications: both tools are moving beyond the terminal toward ambient agents that operate across surfaces.

The question is not which tool wins. The question is which architectural bets hold up as agents get more capable, more persistent, and more autonomous. Both Anthropic and OpenAI have placed very different bets. Now those bets are visible enough that the rest of us can study them, copy them, or deliberately choose against them.

Sources: Haseeb Qureshi, Inside the Claude Code source · Engineer’s Codex, Diving into Claude Code’s Source Code Leak · The AI Corner, Claude Code Source Code Leaked: What’s Inside · How Claude Code Uses React in the Terminal