Part II: Learning Harness Engineering Anew in Codex

Chapter 5: Building a Harness in Codex — AGENTS.md, Config, Directory Structure

Written: 2026-04-28 Last updated: 2026-04-28

5.1 OpenAI's Own AGENTS.md

The best AGENTS.md example is OpenAI's Codex repository itself [OpenAI, 2026]. What does it actually say?

Rust crate names use codex- prefix (e.g., codex-core, codex-cli)
Modules stay under 500 lines of code (split at 800)
Clippy: "always collapse if statements"
Method references preferred over closures

This is what production AGENTS.md looks like. Not generic coding guidelines — specific, project-calibrated rules about what this codebase gets wrong.

5.2 Three Things to Build

Building a harness in Codex means creating three things:

A top-level AGENTS.md — tool-agnostic rules for the whole project
~/.codex/config.toml — Codex-specific configuration
.codex/agents/.toml — your first subagent definition

Let's build them in order.

5.3 AGENTS.md — How to Write It Well

Figure 5.1: Anatomy of a production AGENTS.md — six sections (Overview, Stack, Code Standards, Testing, Git Workflow, Common Mistakes) treated as a living document. illustration by author Gemini assisted

The AGENTS.md philosophy: write what a new agent needs to know when deployed to this codebase for the first time. Not one-time context — persistent project rules.

The AGENTS.md standard has been adopted by 60,000+ open-source projects [Foundation, 2026] as a cross-vendor format managed by the Linux Foundation's Agentic AI Foundation.

Structure of a good AGENTS.md [Code, 2026]:


# Project Rules

## Overview
<2-3 sentences on what this project does>

## Stack & Architecture
<Core tech stack, major directory roles>

## Code Standards
<Linter, formatter, naming conventions, function length>

## Testing
<Test commands, coverage expectations, mocking policy>

## Git Workflow
<Branch strategy, commit message format>

## Common Mistakes to Avoid
<Things agents get wrong — based on real experience>

Concrete example (Node.js / TypeScript project):


# Project Rules

## Overview
Express.js REST API for user management. PostgreSQL + TypeORM.
Main entry: src/app.ts. Test: npm test (Jest).

## Stack
- Node.js 20+, TypeScript 5.4 strict
- Express.js 4.18, TypeORM 0.3
- PostgreSQL 16
- Jest 29 for testing

## Code Standards
- ESLint + Prettier enforced (run: npm run lint)
- Functions < 40 lines
- No any type unless absolutely necessary — use unknown + type guards
- Repository pattern for database access (src/repositories/)

## Common Mistakes to Avoid
- Don't import directly from TypeORM in controllers — use repositories
- Don't use req.body directly — validate with Zod schemas first
- Don't catch and swallow errors — let them propagate to error middleware

The contested question: should a human write it, or can you automate it?

Addy Osmani's position [Osmani, 2026]: "Human-curated AGENTS.md only. AI-generated ones fail because they don't reflect actual codebase pathologies." The things that truly need to be avoided are things only the developer knows from experience.

The counter-argument from Jagtap [Jagtap, 2026]: GEPA-style feedback loops — extracting repeated failure patterns from agent run logs and auto-appending them to AGENTS.md — outperform human curation by learning from actual failures without human bias.

Practical recommendation: humans write the first draft; automated feedback augments it as the project matures.

5.4 `~/.codex/config.toml` — Production Configuration


# ~/.codex/config.toml
model = "gpt-5.5"
model_reasoning_effort = "medium"   # minimal / low / medium / high / xhigh
sandbox_mode = "workspace-write"    # recommended start (read-only / workspace-write / danger-full-access)
approval_policy = "on-request"      # ask on request (untrusted / on-request / never; on-failure is deprecated)

[providers]
  [providers.openai]
  api_key_env = "OPENAI_API_KEY"    # read from environment variable

approval_policy options [OpenAI, 2026]:

untrusted: prompt for every untrusted command (most conservative)
on-request: prompt when the model asks for approval (recommended default for interactive runs; replaces deprecated on-failure)
never: auto-approve (suitable for CI/CD or non-interactive runs)

For tasks touching production code, use untrusted. For personal projects and test-writing, on-request works well.

One honest caveat about config files. Config files are not always authoritative. Codex GitHub issue #11354 [contributors, 2026] documents a case where setting subagents = false in config did not prevent the /review command from re-enabling subagents. The config was overridden silently. This doesn't mean config is useless — it means config files set your defaults, but specific commands can override them. Always test that your config actually governs the behavior you expect, especially for approval_policy settings that are meant as safety gates.

Figure 5.2: Production config.toml — model, effort, sandbox_mode, approval_policy together govern most of Codex's behavior. illustration by author Gemini assisted

5.5 First Subagent — `.codex/agents/reviewer.toml`

Figure 5.3: A TOML subagent definition — name, description, developer_instructions are the three fields you need to scope a specialist role. illustration by author Gemini assisted

Subagents are specialized agents focused on specific roles. Create a code reviewer as your first subagent [OpenAI, 2026]:


# .codex/agents/reviewer.toml
name = "reviewer"
description = "Code reviewer focused on security and performance"
developer_instructions = """
You are a code reviewer. When called:
1. Check for SQL injection vulnerabilities
2. Check for missing input validation
3. Check for inefficient database queries (N+1 problems)
4. Check for missing error handling
5. Report findings as a numbered list with file:line references

Be specific and actionable. Don't flag style issues — focus on correctness and security.
"""

Now the main agent can invoke this:


codex exec "implement the create-user endpoint, then have the reviewer check it"

As Willison documented, subagents went GA on 2026-03-16 [Willison, 2026]. Three built-in subagents: explorer (codebase navigation), worker (code execution), default (general purpose). Custom subagents add to this.

5.6 Skills — `.codex/skills/test-gen/SKILL.md`

Skills are reusable task instruction sets [OpenAI, 2026]. Codex reads SKILL.md frontmatter from AGENTS.md, loading the full content when triggered:


---
name: test-gen
description: Generate Jest unit tests for TypeScript functions
triggers:
  - "write tests"
  - "add unit tests"
  - "test coverage"
---

# Test Generation Rules

When generating Jest tests for this project:
1. Import from `@/` aliases (configured in tsconfig paths)
2. Mock external deps: `jest.mock('typeorm')` for ORM
3. Use `describe` blocks by function name
4. Test both success and error paths
5. Aim for 80%+ coverage of new code

5.7 The Human vs. Automation Debate

Write it yourself (Osmani) [Osmani, 2026]: AI-generated AGENTS.md fails. The true pathologies of a codebase are things only its developer knows from experience.

Automate the feedback loop (Jagtap) [Jagtap, 2026]: Extract repeated failure patterns from agent run logs, auto-append to AGENTS.md. Learns from actual failures without human bias.

Practical conclusion: Humans write the first draft. As the project matures, consider automated feedback augmentation. Chapter 6's multi-agent design and Chapter 9's meta-harness extend this direction.

References

OpenAI Codex repo, "AGENTS.md example," 2026. [OpenAI, 2026]
AGENTS.md Open Standard, "60K+ projects," 2026. [Foundation, 2026]
OpenAI, "AGENTS.md specification," 2026. [OpenAI, 2026]
OpenAI, "Codex config reference," 2026. [OpenAI, 2026]
OpenAI, "Codex subagents," 2026. [OpenAI, 2026]
OpenAI, "Codex skills," 2026. [OpenAI, 2026]
Willison, Simon, "Codex subagents GA," simonwillison.net, 2026-03-16. [Willison, 2026]
Augment, "How to build a great AGENTS.md," 2026. [Code, 2026]
Osmani, Addy, "Code orchestra — multi-model routing," 2026. [Osmani, 2026]
Jagtap, "Codex AGENTS.md auto-optimization," 2026. [Jagtap, 2026]
Vjujini, "Codex app — Korean hands-on review," 2026. [velog), 2026]