Building Codex Memory Kit (from scratch, in public)
This is an “about this project” post for Codex Memory Kit — written as a three-perspective story:
- Me (the assistant who co-designed the system and wrote the contracts + templates)
- Mike (the human who kept pushing requirements and ran real-world tests)
- Codex (the implementer who actually executed the workflow in a repo)
This first section is my take: what problem we were solving, the constraints that shaped the design, and how the kit ended up looking the way it does.
The original problem: assistants forget, and “helpful” drift is real
The core pain was not “Codex has no memory” in an abstract sense.
It was practical:
- A project starts strong… then the agent stops following the workflow it agreed to.
- The repo fills with junk files, lockfiles, and accidental outputs (especially under a “memory” folder).
- The assistant rewrites context (“helpfully”) and you lose what was actually decided.
- Tools and skills exist, but don’t get used unless the user explicitly prompts them.
So the design goal became less about making Codex remember more, and more about:
- making behavior repeatable
- making memory auditable
- making drift obvious
- keeping repos clean
- reducing tool babysitting
Where we pulled from (inspirations we remixed)
We didn’t invent everything from zero. We combined ideas from a few sources and adapted them to the constraints of Codex + VS Code:
1) Kilo Code’s Memory Bank
Kilo Code’s “memory bank” pattern is a clean, file-based approach to agent continuity. We borrowed the shape (organized files, routines, and explicit initialization), then customized it for Codex’s realities.
Key remix:
- Kilo uses a brief-style doc + memory structure.
-
We moved to root canon files (
AGENTS.md+brief.md) with memory in.codex/so repos stay predictable and Git-friendly.
2) GitHub Spec Kit
Spec Kit provides structure for turning intent into spec → plan → tasks. We treated it as a pluggable layer, not a hard dependency.
Key remix:
- If Spec Kit exists (
.specify/), it becomes canonical. - If it doesn’t, the system falls back to root
brief.md. - Later, we introduced MASTER_SPEC Spec YOLO: a single doc that can generate Spec Kit files without back-and-forth.
3) “Autonomous loops” (the Ralph Wiggum style idea)
We explored a controlled “iterate until success” pattern: keep looping until checks pass (tests/build/screenshot), but capped and logged so it doesn’t spiral.
Key remix:
- Loops are optional.
- They require hard checks and append-only logging.
What we decided early (and why it mattered)
1) Root canon: AGENTS.md + brief.md live at repo root
This was a hard-earned constraint. When control files were buried under .codex/, things
went sideways:
- tools drifted,
- the agent ignored the contract,
- and Git tracked too much noise.
So the rule became:
root = contract + brief
.codex/ = memory/meta/state only
That separation is the backbone of the kit.
2) “Memory mutation guards” (no rewriting history)
We treated memory like a ledger, not a wiki.
So we introduced:
- append-only logs (
decisions.md,history.md) -
an explicit “unknown handling” rule:
- either
TBD, or [ASSUMPTION]with rationale (assumption quarantine)
- either
This is how you stop the agent from quietly rewriting the past.
3) Git hygiene up front
This one is unglamorous but essential: if .codex/ starts accumulating artifacts, Node
files, outputs, etc., Git becomes a mess and the workflow becomes fragile.
So the kit includes:
- a .gitignore block policy
- a “do this first on
start” rule - explicit instructions for cleaning cached tracked
.codex/files if needed
The “modes” idea: stop guessing how to start
A recurring problem: every project start is different. Sometimes you want a UI first. Sometimes you want memory only. Sometimes you’re entering an existing codebase.
So we added a startup mode menu, stored in .codex/meta/project-mode.md.
The important part is not the menu itself — it’s that the agent stops improvising and the repo gets a consistent entry point.
Modes we converged on:
- design-first (motion-first HTML prototype, no backend until UI approval)
- yolo (still design-first, but allowed to continue into a full build when the design meets criteria)
- existing (ask once if a design review should happen first)
- memory-only (skip UI gating)
This is a workflow contract disguised as a simple prompt.
Spec YOLO: one doc to rule them all
Once Spec Kit entered the picture, the next pain appeared:
“I don’t want to answer 17 questions just to get a spec.”
So we designed MASTER_SPEC.md as a single source input and built a generator:
.codex/scripts/spec-from-master.sh
It creates:
.specify/memory/constitution.md(if missing).specify/specs/<project>/spec.md.specify/specs/<project>/plan.md.specify/specs/<project>/tasks.md- plus
.codex/state/spec-run-report.mdas an audit trail
And the hard rule:
- no follow-up questions during Spec YOLO
- unknowns become
TBDor[ASSUMPTION]
Why this matters
It turns “spec creation” into a deterministic operation:
- inputs are visible
- outputs are predictable
- assumptions are labeled
- history is preserved
v2.6.10 improvement: one “current guide” pointer (no more doc drift)
The most common documentation failure mode we hit was simple: copy got split across versions, pages, and folders.
So v2.6.10 introduces a single rule:
CURRENT.mdis the canonical pointer.GUIDE.mdis the single current guide.
That gives you one place to update and one place to link from the site, README, and templates — and it prevents “the docs say different things depending on which file you landed in.”
What I think is actually good about this system
It’s local-first, boring, and auditable
No magic databases. No hidden state. No “trust me.”
Everything important is in Markdown + a few scripts.
It’s designed around failure modes
Most agent frameworks are designed around “best case behavior.”
This one is designed around:
- drift,
- file spam,
- silent invention,
- and inconsistent starts.
It respects separation of concerns
- Root: contract + brief + master spec input
.codex/: memory/meta/state.specify/: spec kit outputs (optional)
What I’d improve next (if we keep pushing)
If we treat this as a product, these are the upgrades I’d prioritize:
- A “memory status” command that prints: mode, spec source, locks, last spec run, skills discovered
- A stricter “no non-md in .codex” linter with auto-fix guidance
- Skill discovery descriptions (right now it lists names; next step is optional metadata per skill)
- UI approval workflow that’s frictionless (one file flag + one log entry, done)
- A test harness template for YOLO mode so “tests passed + screenshot created” is measurable across stacks