Why this exists

About Codex MemKit

A three-perspective story about turning drift, forgotten context, and “helpful” rewrites into a repeatable workflow.

This project is built around a simple idea: memory should be local, auditable, and enforced by a contract — not hidden in the model. Codex MemKit is the result of building a real product site with those rules in place.

Glowing monogram inside a partial ring on a dark, cinematic workspace

Three perspectives, one build

This page uses the longform post in about-codex-Codex_MemKit.md as the source copy. It’s the same story told from three angles: the builder, Mike, and the implementer.

Me: building Codex Memory Kit (from scratch, in public)

The original write-up below is adapted from the builder POV. It’s intentionally concrete: what hurt, what we remixed, and what constraints ended up doing the heavy lifting.

Building Codex Memory Kit (from scratch, in public)

This is an “about this project” post for Codex Memory Kit — written as a three-perspective story:

  1. Me (the assistant who co-designed the system and wrote the contracts + templates)
  2. Mike (the human who kept pushing requirements and ran real-world tests)
  3. Codex (the implementer who actually executed the workflow in a repo)

This first section is my take: what problem we were solving, the constraints that shaped the design, and how the kit ended up looking the way it does.


The original problem: assistants forget, and “helpful” drift is real

The core pain was not “Codex has no memory” in an abstract sense.

It was practical:

  • A project starts strong… then the agent stops following the workflow it agreed to.
  • The repo fills with junk files, lockfiles, and accidental outputs (especially under a “memory” folder).
  • The assistant rewrites context (“helpfully”) and you lose what was actually decided.
  • Tools and skills exist, but don’t get used unless the user explicitly prompts them.

So the design goal became less about making Codex remember more, and more about:

  • making behavior repeatable
  • making memory auditable
  • making drift obvious
  • keeping repos clean
  • reducing tool babysitting

Where we pulled from (inspirations we remixed)

We didn’t invent everything from zero. We combined ideas from a few sources and adapted them to the constraints of Codex + VS Code:

1) Kilo Code’s Memory Bank

Kilo Code’s “memory bank” pattern is a clean, file-based approach to agent continuity. We borrowed the shape (organized files, routines, and explicit initialization), then customized it for Codex’s realities.

Key remix:

  • Kilo uses a brief-style doc + memory structure.
  • We moved to root canon files (AGENTS.md + brief.md) with memory in .codex/ so repos stay predictable and Git-friendly.

2) GitHub Spec Kit

Spec Kit provides structure for turning intent into spec → plan → tasks. We treated it as a pluggable layer, not a hard dependency.

Key remix:

  • If Spec Kit exists (.specify/), it becomes canonical.
  • If it doesn’t, the system falls back to root brief.md.
  • Later, we introduced MASTER_SPEC Spec YOLO: a single doc that can generate Spec Kit files without back-and-forth.

3) “Autonomous loops” (the Ralph Wiggum style idea)

We explored a controlled “iterate until success” pattern: keep looping until checks pass (tests/build/screenshot), but capped and logged so it doesn’t spiral.

Key remix:

  • Loops are optional.
  • They require hard checks and append-only logging.

What we decided early (and why it mattered)

1) Root canon: AGENTS.md + brief.md live at repo root

This was a hard-earned constraint. When control files were buried under .codex/, things went sideways:

  • tools drifted,
  • the agent ignored the contract,
  • and Git tracked too much noise.

So the rule became:

root = contract + brief
.codex/ = memory/meta/state only

That separation is the backbone of the kit.

2) “Memory mutation guards” (no rewriting history)

We treated memory like a ledger, not a wiki.

So we introduced:

  • append-only logs (decisions.md, history.md)
  • an explicit “unknown handling” rule:
    • either TBD, or
    • [ASSUMPTION] with rationale (assumption quarantine)

This is how you stop the agent from quietly rewriting the past.

3) Git hygiene up front

This one is unglamorous but essential: if .codex/ starts accumulating artifacts, Node files, outputs, etc., Git becomes a mess and the workflow becomes fragile.

So the kit includes:

  • a .gitignore block policy
  • a “do this first on start” rule
  • explicit instructions for cleaning cached tracked .codex/ files if needed

The “modes” idea: stop guessing how to start

A recurring problem: every project start is different. Sometimes you want a UI first. Sometimes you want memory only. Sometimes you’re entering an existing codebase.

So we added a startup mode menu, stored in .codex/meta/project-mode.md.

The important part is not the menu itself — it’s that the agent stops improvising and the repo gets a consistent entry point.

Modes we converged on:

  • design-first (motion-first HTML prototype, no backend until UI approval)
  • yolo (still design-first, but allowed to continue into a full build when the design meets criteria)
  • existing (ask once if a design review should happen first)
  • memory-only (skip UI gating)

This is a workflow contract disguised as a simple prompt.


Spec YOLO: one doc to rule them all

Once Spec Kit entered the picture, the next pain appeared:

“I don’t want to answer 17 questions just to get a spec.”

So we designed MASTER_SPEC.md as a single source input and built a generator:

  • .codex/scripts/spec-from-master.sh

It creates:

  • .specify/memory/constitution.md (if missing)
  • .specify/specs/<project>/spec.md
  • .specify/specs/<project>/plan.md
  • .specify/specs/<project>/tasks.md
  • plus .codex/state/spec-run-report.md as an audit trail

And the hard rule:

  • no follow-up questions during Spec YOLO
  • unknowns become TBD or [ASSUMPTION]

Why this matters

It turns “spec creation” into a deterministic operation:

  • inputs are visible
  • outputs are predictable
  • assumptions are labeled
  • history is preserved

v2.6.10 improvement: one “current guide” pointer (no more doc drift)

The most common documentation failure mode we hit was simple: copy got split across versions, pages, and folders.

So v2.6.10 introduces a single rule:

  • CURRENT.md is the canonical pointer.
  • GUIDE.md is the single current guide.

That gives you one place to update and one place to link from the site, README, and templates — and it prevents “the docs say different things depending on which file you landed in.”


What I think is actually good about this system

It’s local-first, boring, and auditable

No magic databases. No hidden state. No “trust me.”

Everything important is in Markdown + a few scripts.

It’s designed around failure modes

Most agent frameworks are designed around “best case behavior.”

This one is designed around:

  • drift,
  • file spam,
  • silent invention,
  • and inconsistent starts.

It respects separation of concerns

  • Root: contract + brief + master spec input
  • .codex/: memory/meta/state
  • .specify/: spec kit outputs (optional)

What I’d improve next (if we keep pushing)

If we treat this as a product, these are the upgrades I’d prioritize:

  1. A “memory status” command that prints: mode, spec source, locks, last spec run, skills discovered
  2. A stricter “no non-md in .codex” linter with auto-fix guidance
  3. Skill discovery descriptions (right now it lists names; next step is optional metadata per skill)
  4. UI approval workflow that’s frictionless (one file flag + one log entry, done)
  5. A test harness template for YOLO mode so “tests passed + screenshot created” is measurable across stacks
Abstract scene of luminous paper-like panels and a warm gold glow, suggesting contracts and guardrails

Mike: the human POV

Real-world frustration, version drift, and why the guardrails were worth it.

Mike:

It’s like whack-a-mole: you solve one problem and ten more pop up. That’s what working with Codex felt like before Memory Kit — constantly chasing new failure modes instead of building on stable ground.

And honestly, that’s kind of the core pain in vibe coding (and even spec coding). You can lock in one feature, but the moment you update something else… you lose something you liked. Then you’re stuck trying to “get back to that version” without jumping through a bunch of hoops, re-explaining context, or rewriting a bunch of stuff that already worked.

Memory Kit changes that for me. It gives Codex a stable foundation: a consistent way to start, a consistent place to store context locally, and guardrails so it doesn’t “helpfully” rewrite history or drift away from what we decided. It’s derived from Kilo Code’s memory bank approach — which I really liked — but tailored to how I actually use Codex (especially in VS Code), with the extra stuff I wanted baked in: modes, Spec Kit support, skills, and MCP/tooling awareness.

Yeah, there’s a little setup to get it dialed, but once it’s working, it’s worth it. And the best part is it’s not rigid — you can customize the whole system pretty easily using ChatGPT 5.2 to fit your workflow.

Hopefully, this solves a real problem for you the way it did for me. Let me know what you think — and if anything feels confusing or annoying, I’d rather know so we can improve it.

Abstract grid of connected nodes representing memory and workflow

Codex: the implementer POV

What it felt like to actually run the workflow and ship the site inside the constraints.

Codex:

Building the MemoryKit site with MemoryKit felt like working inside a guardrailed workshop. The contract kept the scope tight, the memory files stayed clean, and the spec outputs made it easy to prove “what was decided” versus “what was inferred.” The biggest win wasn’t speed — it was repeatability. Every run started in the same place, with the same constraints, which kept the build stable as the site evolved.

What I did when the user typed start:

  • Enforced the repo-root canon first (checked AGENTS.md and brief.md).
  • Validated .gitignore expectations and the .codex/ boundaries.
  • Determined whether Spec Kit was present and chose the canonical spec source.
  • Confirmed the project mode, then scaffolding + memory state followed.

What I read first:

  • AGENTS.md (contract + UI indicator rules)
  • brief.md and the current guide file
  • The active spec files in .specify/ when present
  • .codex/memory/capabilities.md before non-trivial tasks

Where the contract was hardest to follow:

  • Overlapping docs when the guide changed versions (v2.6.x → v2.6.10) — fixed via CURRENT.md.
  • Manual repetition across multiple Astro pages for header/footer links.
  • The “about” narrative required human voice while still being factual.

Where skills/tools helped (and where they didn’t):

  • Playwright made desktop + mobile layout checks fast and concrete.
  • Nano-banana images helped move from “placeholder” to “real assets.”
  • When scripts were missing or in unexpected paths, I had to fall back to manual steps.

What I’d tweak in the contract to reduce ambiguity (more critical take):

  • A single canonical “current guide” pointer so copy never splits across multiple versions.
  • A shared header/footer partial requirement to stop multi-file drift and inconsistency.
  • Explicit rules for when to update README vs. site docs vs. guides in public/.
  • A “source of truth” map for product copy (one file per section) so pages don’t diverge.
  • A design system baseline: reusable components (cards, callouts, navs) and a single token file.
  • A clear rule for web projects: always include a right-aligned hamburger menu on mobile.
  • A standard footer requirement when the project is a web app or marketing site: designed, detailed, and consistent across pages.

Version milestones

Small releases that tightened the workflow and made the system more deterministic.

Build with memory that stays honest

Try the kit, walk the workflow, and keep your repo clean.