Cosmo memory v2 — the plan

Decisions locked. Building next. Sat 25 Apr 2026.

Listen — tap to start, auto-plays each section
What this doc is. Yesterday we shipped the architecture spec. Today we resolved the open forks and locked the plan. This is the build brief — what's decided, why, what's still open, and what happens first. It's also the first plan the new memory system itself will manage. The system supervises its own birth.
Hard rule for every LLM call below. Every reference in this doc to "Sonnet 4.6", "Opus 4.7", or any other model — hot path, dream pass, bootstrap ingest, link phase, trigger checker, interrupt classifier, shadow comparison, all of them — runs through @anthropic-ai/claude-agent-sdk's query() function. Not the direct @anthropic-ai/sdk. Cosmo has no ANTHROPIC_API_KEY; auth flows through Claude Code session credentials. The pattern in src/agent.js is canonical. Model selection is via the model field in query() options. No exceptions.
Contents
  1. What we agreed
  2. The topic file shape
  3. The dream pass — two phases
  4. Model assignments
  5. Shadow mode + the bake-in test
  6. Archive policy
  7. Build order
  8. What's still open
  9. What happens next

1. What we agreed

Eight things locked in this morning. Each was a real choice with a defensible alternative.

DecisionWhat we pickedWhat we rejected
Read path router Small Sonnet 4.6 call picks 1–3 topic files per turn. Inject every topic file every turn, lean on prompt cache. Simpler but token-wasteful at scale.
Directory structure flat One directory, one file per topic. No nesting. Hierarchy (health/running/phase9). Ages badly. The "where does this go" problem is unsolvable.
File metadata YAML frontmatter + flat tags Five fields: title, tags, status, updated, related. Plain markdown only (Karpathy's punt) or directory hierarchy as the metadata (Letta).
Dream pass shape two-phase Edit phase first (consolidate, prune, split, merge). Then link phase (re-traverse, fix related:). Single-pass file-by-file. Misses transitive connections after a split or merge.
Dream pass model Sonnet 4.6 across all phases. Bootstrap ingest uses Opus 4.7 (one-off). Opus for the consolidate phase nightly. Anthropic's own guidance says Sonnet is in range, and burning Max-plan Opus quota every night is not free.
Confidence test shadow mode First 14 nights, Opus also runs the consolidate phase, output diffed for review. Trust Sonnet without comparison. Risks silently shipping worse output for months before noticing.
Archive policy no auto-archive Files leave the active set only via superseded_by or explicit user archive. Time-based archive after N months idle. "Silent" doesn't mean "dead" — old reference files stay relevant forever.
Bootstrap ingest scope wide Read existing Claude Code memory, all session docs, skill files, every CLAUDE.md, every spec. One-off Opus pass. Narrow ingest. Faster but loses richness. The dream pass is designed to prune anyway.
Bi-temporal scope date everything Every fact gets a date when written. (since YYYY-MM) for ongoing, (YYYY-MM to YYYY-MM) for ranges, exact day where known, ~YYYY for fuzzy. SCHEMA.md enforces. Opt-in by tag (ages — same as hierarchy in disguise) or LLM-decides-per-fact (drifts across model versions). Both create inconsistent corpus over years.
Dashboard + repo layout dashboard from day one Sibling repo ~/cosmo-memory/ (private GitHub), separate from cosmo code. Local dashboard reads the filesystem from step 1.5. Production deploy with Worker basic auth comes later. Dashboard at step 9 (too late — bootstrap ingest would run blind). Memory inside cosmo repo (mixes personal data with code). Cloudflare Access (good but heavier than needed for now).

2. The topic file shape

Every file in ~/cosmo-memory/topics/ looks the same.

---
title: Running — Phase 9 (comeback 5K)
tags: [health, running, training-plan, current]
status: active
created: 2026-04-12
updated: 2026-04-25
related: [migraine-history, running-history]
sources:
  - ~/.claude/projects/.../memory/project_phase9_comeback_5k.md
  - .claude/skills/health/current/phase9-context.md
---

# Phase 9 — comeback 5K plan

[content lives here as plain markdown]

Five fields the eventual web dashboard renders directly: title, tags (filter chips), status (badge), updated (sort order), related (nav links). Tags are a flat list — no hierarchy. The dashboard can group by tag if it wants, but the file itself doesn't know about that.

Why frontmatter and not just markdown: yesterday's research showed the prior art is split. Karpathy says "format is up to you." Stevens uses tags but no frontmatter (it's SQLite). Letta uses frontmatter but no tags (it uses directory hierarchy). We're picking the combination that works for our specific need: a future web dashboard wants structured fields to filter and sort by. Plain markdown gives the dashboard nothing.

3. The dream pass — two phases

Runs nightly at 3am. If the Mac was off, runs on next wake (catch-up logic).

flowchart TD S[3am trigger
or catch-up on wake] --> O[Phase 0: Orient
read INDEX.md, last run log,
list files modified since last run] O --> G[Phase 1: Gather
read fresh entries +
flagged files +
5% rotating sample of older files] G --> E[Phase 2: Edit
SONNET 4.6
consolidate, prune, split, merge
resolve contradictions] E --> L[Phase 3: Link
SONNET 4.6
re-traverse touched files +
their first-degree neighbours
rebuild related: arrays
fix orphaned sources] L --> B[Phase 4: Brief
SONNET 4.6
write tomorrow's morning brief
from active topics + calendar + inbox] B --> D[7am delivery
brief lands as message] style E fill:#3a2818,stroke:#ffb454,color:#ffd595 style L fill:#1c2c1c,stroke:#7bd88f,color:#a3e0b3

Two reasons for the split:

  1. Edit changes the topology. If the edit phase splits running.md into running-history.md + running-phase-9.md, every other file that referenced running now has a stale link. The link phase catches that.
  2. Link is cheaper than edit. Link only needs frontmatter + first paragraph of each file, not full content. Worth a separate pass to keep edit focused on the hard work.

What the dream pass actually processes each night:

Typical night: 20 active files in the edit phase, ~50 files in the link phase (touched + first-degree neighbours via related:). Not the full library every night.

4. Model assignments

JobModelWhy
Hot path (you talk to Cosmo) User's choice via /opus or /sonnet Already working, no change.
Bootstrap ingest (one-off) Opus 4.7 Wide scope, novel synthesis from years of scattered material, runs once. Worth it.
Dream pass — consolidate, prune, brief Sonnet 4.6 Anthropic's own guidance: "Sonnet as default, Opus for hard problems." Merging files isn't a hard problem. SWE-bench gap is 1.2 points.
Dream pass — link phase Sonnet 4.6 Could be Haiku 4.5 (classification task), but on Max plan there's no per-token saving. Simpler to keep one model across the whole pass.
Trigger checker (step 8) Sonnet 4.6 Same reasoning. The boomerang lesson applies: don't hardcode Opus on a job that runs forever.
Interrupt classifier (step 9) Sonnet 4.6 Decides notify / question / review based on busy signals. Light reasoning over structured input.

Each phase is a single value in the config file. If we discover Sonnet is visibly worse than Opus on the consolidate phase (see next section), we swap one constant. No code change.

5. Shadow mode + the bake-in test

The honest problem: if we only ever run Sonnet, we never see what Opus would have produced. We could ship a slightly worse system for months without noticing.

So for the first 14 nights of dream-pass operation, both models run the consolidate phase:

flowchart TD G[Phase 1: Gather] --> E1[Phase 2: Edit
SONNET 4.6
writes to ~/cosmo-memory/] G --> E2[Phase 2: Edit shadow
OPUS 4.7
writes to ~/cosmo-memory-shadow/] E1 --> L[Phase 3: Link
continues with Sonnet output as live] E2 --> X[Diff written to dashboard
read in morning brief] L --> B[Phase 4: Brief] B --> D[7am delivery] style E1 fill:#3a2818,stroke:#ffb454,color:#ffd595 style E2 fill:#2a1838,stroke:#a78bfa,color:#cbb5fc

After 14 nights, three possible outcomes:

  1. Opus is meaningfully better → swap consolidate to Opus permanently
  2. Sonnet is roughly equivalent → kill the shadow, save the quota, evidence justifies the choice
  3. Inconclusive → extend shadow another N nights

The mechanism is reusable. Same shadow pattern works later for "is Haiku enough for the link phase?" or "is Sonnet 4.7 better than 4.6 once it ships?"

The bake-in itself is the first real test of the new system. When the tasks directory is built (step 5), this becomes:
~/cosmo-memory/tasks/active/dream-pass-shadow-review.md

---
title: Dream pass — Sonnet vs Opus shadow review
status: active
created: 2026-04-25
trigger:
  type: date
  fire_at: 2026-05-09
  mode: question
related: [plans/memory-v2, dream-pass]
---

After 14 nights of shadow-mode comparison:
- Open the diff viewer
- Read 3-5 representative diffs
- Decide: keep Sonnet, switch to Opus, or extend shadow
- Update plans/memory-v2.html with the decision
- Either flip comparison_model: off, or rotate to monthly bake-off mode

The trigger fires Sat 9 May 2026, lands as a mode: question interrupt in the morning brief. If the new system surfaces it correctly, we know the trigger mechanism works. If it drops the ball, we catch it via a backup external /schedule agent set for the same date.

6. Archive policy

Two ways a file can leave the active set. No third option.

status: superseded

Set automatically by the consolidate phase when it splits or merges files. The old file gets superseded_by: <newer-file> in its frontmatter and stays on disk as a pointer. Searchable but doesn't load by default.

status: archived

Set explicitly by the user (/archive <topic>) or by the agent after the user has explicitly said "we're done with X." Never automatic, never time-based.

What we explicitly rejected: time-based auto-archive after N months idle. A reference file like australian-driving-rules.md might never get updated and still be relevant forever. Silent files aren't dead files.

Only active status loads into context by default.

7. Build order

Same nine steps as yesterday. Today added the prep work (commit, branch, archive boomerang) plus more detail on what each step ships.

#StepShipsStatus
0aCommit current main stateClean working tree, 9 fresh commitsdone
0bCut memory-v2 branchBranch created off clean maindone
0cArchive boomerang v1archive/boomerang-v1/ snapshot + READMEdone
0dPlan docplans/memory-v2.html + audio + Pages deploythis doc
1Memory v1Delete Qdrant code + embeddings + boomerang refs from agent.js/bot.js. Create ~/cosmo-memory/ with INDEX.md, SCHEMA.md, topics/, plans/, tasks/, inbox/. Init git, push to private choujar/cosmo-memory. Wire router + extractor in agent.js and bot.js.next
1.5Dashboard v0 (local)Minimal local viewer: list topics, render markdown + frontmatter, filter by tag, sort by updated. Localhost only, no auth. Ships before bootstrap ingest so step 2 isn't blind.pending
2Bootstrap ingestOne-off Opus pass over Claude Code memory, all session docs, skill files, every CLAUDE.md, all specs. Synthesises ~40-60 topic files. Watch via dashboard.pending
3Dream pass v1Nightly 3am PM2 process. Five phases (Orient → Gather → Edit → Link → Brief). Catch-up logic. Shadow mode enabled.pending
4Claude Code integrationSessionStart hook so Claude Code reads from ~/cosmo-memory/ too — interface parity.pending
5Tasks directorytasks/active/, tasks/blocked/, tasks/done/. First task created: the shadow-review trigger above.pending
6Plans directoryplans/. First plan that lives there: this doc itself, ported.pending
7Inbox + morning briefinbox/ for proactive surfaces. 7am brief delivery via Telegram.pending
8Triggerstrigger: field on task files. Sonnet-powered checker process. Salvages chain linking + ack windows from boomerang archive.pending
9Judge + polishInterrupt classifier (notify / question / review). Dashboard production deploy with Worker basic auth. Slash commands. Edit features in dashboard.pending

Steps 1-3 are the core. Roughly one solid day of work, possibly less if we don't get distracted.

8. What's still open

One thing parked. Doesn't block the build.

  1. Multi-machine coordination. Old MacBook Pro might come up as an always-on host. Parked until that hardware is live. ~/cosmo-memory/ being a git repo synced to GitHub makes pull-on-wake reasonable, but real coordination (avoiding two dream passes at 3am, lock-file semantics, who-pushes-wins) is a separate design.
Resolved between plan v1 and v2.

9. What happens next

You read this doc. Listen to it in the car if that's easier — it's the same audio-player setup as yesterday's spec, lock-screen controls and all.

If anything in here looks wrong, push back before I start step 1. Once you say go, I:

  1. Strip Qdrant + embeddings + boomerang references from src/agent.js and src/bot.js
  2. Delete the live boomerang files (already archived)
  3. Create ~/cosmo-memory/ with INDEX.md skeleton, SCHEMA.md (the topic file shape from section 2), empty topics/
  4. Wire the new read path (router) and write path (extractor) in src/agent.js
  5. Hand back for review of the running system before moving to step 2

Then bootstrap ingest. Then dream pass. Then everything else.

The meta point. This is the first plan the new memory system will ever manage. Step 6 (plans directory) ports this exact file into ~/cosmo-memory/plans/memory-v2.md. Step 5 (tasks directory) creates the first task file pointing to the Sat 9 May 2026 shadow review. The system bootstraps itself by overseeing its own birth — that's the working test. If it can't keep track of its own build, it can't keep track of anything else.

Doc lives at plans/memory-v2.html on the memory-v2 branch. Generated Sat 25 Apr 2026.

Now playing