Claude Code vs. Codex: The Architecture Reveals the Bets Behind the Harnesses

Part 4 of 5 | Series: What We Learned from the Claude Code Leak


As I continue on my personal journey to learn more about Agent harnesses are architected, the Claude Code leak gave us something the AI industry rarely provides: a genuine apples-to-apples architectural comparison between the two leading AI platforms.

OpenAI’s Codex source has been public for a while. Claude Code’s source just became public by accident. For the first time, you can read both and compare the actual engineering decisions, not the marketing on LinkedIn and Tweets.

[Read more]

Continuous Productive Autonomy Needs Control Layers

Part 3 of 5 | Series: What We Learned from the Claude Code Leak


There is a version of the Claude Code leak story that is entirely about features: background agents, persistent memory, multi-agent orchestration. It is a compelling story. But it misses something.

When you give an agent real power, the interesting engineering problem is not “what can it do.” It is “what stops it from doing the wrong thing.” The Claude Code source is valuable here because it shows how the control layer works.

[Read more]

What If Code Review Was Designed for Abundance Instead of Scarcity?

Illustration of many cute robot reviewers collaborating in parallel around a code review workflow.

For a long time, code review has followed a pretty familiar pattern in our industry. Open a pull request, tag a few people, and wait.

A big part of that process was always about managing scarcity. Who has context? Who has time? Who is available right now? If you tag too many people, you create noise and slow things down. If you tag too few, or the wrong people, good feedback can still slip through.

[Read more]

Memory and Context Management: The Hardest Problem in Building with Agents

Part 2 of 5 | Series: What We Learned from the Claude Code Leak


If you have been working with agents, you know that moment when you feel the session starting to drift? You are several hours into your session. The context window fills up. The agent starts forgetting things it knew twenty minutes ago. You patch it with summaries, reintroducing requirements and guidelines. This is so painful and frustrating.

[Read more]

The Harness Is the Moat: Inside the Architecture Powering Autonomous Agents

Part 1 of 5 | Series: What We Learned from the Claude Code Leak


Everyone talks about the model. Which one is smarter. Which benchmark it topped. Which lab is ahead this week. Is Mythos going to change everything?

The Claude Code source leak that happened a few weeks ago tells a very interesting story. The actual API call to Claude, the part that talks to the model, is about 200 lines of code. Everything else, more than 500,000 lines, is the harness around it.

[Read more]

Why I Think Dreaming Is a Real Breakthrough for Agent Memory

If you have built agents that run for more than a few turns, you know where things start to break. The session gets longer. The context gets heavier. Compaction kicks in. Summaries get written. Important details get flattened. The agent may still sound coherent, but execution gets worse.

This is one of the most important problems in agent engineering right now. Consistent execution on long, complex tasks. The community has been trying to solve it with projects like mem0. More recently, even actress Milla Jovovich shared a project called MemPalace.

[Read more]

An Important Difference Between Anthropic and OpenAI Has Nothing to Do With Benchmarks

Two agents can be similarly capable and still feel completely different to work with.

I have spent a lot of time recently using both Anthropic and OpenAI models across writing, coding, research, and agent workflows. All of these frontier models are exceptionally good. This is not an argument that one is broadly superior to the other, or that this is the single most important difference between them.

It is just a very interesting angle, and one that I think many people perceive when they use these models heavily but rarely put into words.

[Read more]

Not All Agent Work Is Equal. Here’s How to Tell the Difference

One of the most useful frameworks I’ve seen lately came from one of our most AI-forward engineers, Bian Jiang, at our weekly Engineering All Hands at Attentive

He shared his approach when using AI agents and how much care one should have about the outcome from their agent interactions. I’ve been thinking about it but hadn’t articulated it as clearly as he did. Not all agent interactions are equal, and the difference between good and bad outcomes often comes down to one thing: your self-awareness about what you know and what you don’t about what you ask the agent to do.

[Read more]

AI Communication Agent: How Slash Commands Make Claude and OpenClaw Reliable for Writing and Drafting

I use an AI agent to help me communicate. Not to write for me, but to help me be more efficient and as English is my second language it helps me make fewer mistakes, but the output should always sound like Antonio. It reviews my drafts, helps me write outreach, and prepares me before important conversations.

I built this on top of OpenClaw and on Claude Code which I can access via Telegram, runs persistent sessions, and lets you configure agents with custom behavior files. If you prefer to stay in the terminal, the same patterns work with Claude Code and a CLAUDE.md file and creating skills, more on that at the end.

[Read more]

How I Use AI Agents to Build Software and Improve Productivity

A practical example of human-agent collaboration.


I run several AI agents as part of my daily workflow. Not as a chat assistant you type questions into. As execution partners that do the work while I handle the judgment and prioritization.

Here’s how it works in practice.


The Setup

I have agents with different scopes: one focused on complex execution, code, and infrastructure. Another focused on scheduling, coordination, and lighter repeatable tasks more related to personal productivity. I also have a third Agent only focus on communications, but that deserves dedicated post about it.

[Read more]