Cosmo memory v2 — the plan

Decisions locked. Building next. Sat 25 Apr 2026.

Listen — tap to start, auto-plays each section

What this doc is. Yesterday we shipped the architecture spec. Today we resolved the open forks and locked the plan. This is the build brief — what's decided, why, what's still open, and what happens first. It's also the first plan the new memory system itself will manage. The system supervises its own birth.

Hard rule for every LLM call below. Every reference in this doc to "Sonnet 4.6", "Opus 4.7", or any other model — hot path, dream pass, bootstrap ingest, link phase, trigger checker, interrupt classifier, shadow comparison, all of them — runs through @anthropic-ai/claude-agent-sdk's query() function. Not the direct @anthropic-ai/sdk. Cosmo has no ANTHROPIC_API_KEY; auth flows through Claude Code session credentials. The pattern in src/agent.js is canonical. Model selection is via the model field in query() options. No exceptions.

Contents

What we agreed
The topic file shape
The dream pass — two phases
Model assignments
Shadow mode + the bake-in test
Archive policy
Build order
What's still open
What happens next

1. What we agreed

Eight things locked in this morning. Each was a real choice with a defensible alternative.

Decision	What we picked	What we rejected
Read path	router Small Sonnet 4.6 call picks 1–3 topic files per turn.	Inject every topic file every turn, lean on prompt cache. Simpler but token-wasteful at scale.
Directory structure	flat One directory, one file per topic. No nesting.	Hierarchy (`health/running/phase9`). Ages badly. The "where does this go" problem is unsolvable.
File metadata	YAML frontmatter + flat tags Five fields: title, tags, status, updated, related.	Plain markdown only (Karpathy's punt) or directory hierarchy as the metadata (Letta).
Dream pass shape	two-phase Edit phase first (consolidate, prune, split, merge). Then link phase (re-traverse, fix `related:`).	Single-pass file-by-file. Misses transitive connections after a split or merge.
Dream pass model	Sonnet 4.6 across all phases. Bootstrap ingest uses Opus 4.7 (one-off).	Opus for the consolidate phase nightly. Anthropic's own guidance says Sonnet is in range, and burning Max-plan Opus quota every night is not free.
Confidence test	shadow mode First 14 nights, Opus also runs the consolidate phase, output diffed for review.	Trust Sonnet without comparison. Risks silently shipping worse output for months before noticing.
Archive policy	no auto-archive Files leave the active set only via `superseded_by` or explicit user archive.	Time-based archive after N months idle. "Silent" doesn't mean "dead" — old reference files stay relevant forever.
Bootstrap ingest scope	wide Read existing Claude Code memory, all session docs, skill files, every CLAUDE.md, every spec. One-off Opus pass.	Narrow ingest. Faster but loses richness. The dream pass is designed to prune anyway.
Bi-temporal scope	date everything Every fact gets a date when written. `(since YYYY-MM)` for ongoing, `(YYYY-MM to YYYY-MM)` for ranges, exact day where known, `~YYYY` for fuzzy. SCHEMA.md enforces.	Opt-in by tag (ages — same as hierarchy in disguise) or LLM-decides-per-fact (drifts across model versions). Both create inconsistent corpus over years.
Dashboard + repo layout	dashboard from day one Sibling repo `~/cosmo-memory/` (private GitHub), separate from cosmo code. Local dashboard reads the filesystem from step 1.5. Production deploy with Worker basic auth comes later.	Dashboard at step 9 (too late — bootstrap ingest would run blind). Memory inside cosmo repo (mixes personal data with code). Cloudflare Access (good but heavier than needed for now).

2. The topic file shape

Every file in ~/cosmo-memory/topics/ looks the same.

---
title: Running — Phase 9 (comeback 5K)
tags: [health, running, training-plan, current]
status: active
created: 2026-04-12
updated: 2026-04-25
related: [migraine-history, running-history]
sources:
  - ~/.claude/projects/.../memory/project_phase9_comeback_5k.md
  - .claude/skills/health/current/phase9-context.md
---

# Phase 9 — comeback 5K plan

[content lives here as plain markdown]

Five fields the eventual web dashboard renders directly: title, tags (filter chips), status (badge), updated (sort order), related (nav links). Tags are a flat list — no hierarchy. The dashboard can group by tag if it wants, but the file itself doesn't know about that.

Why frontmatter and not just markdown: yesterday's research showed the prior art is split. Karpathy says "format is up to you." Stevens uses tags but no frontmatter (it's SQLite). Letta uses frontmatter but no tags (it uses directory hierarchy). We're picking the combination that works for our specific need: a future web dashboard wants structured fields to filter and sort by. Plain markdown gives the dashboard nothing.

3. The dream pass — two phases

Runs nightly at 3am. If the Mac was off, runs on next wake (catch-up logic).

flowchart TD S[3am trigger
or catch-up on wake] --> O[Phase 0: Orient
read INDEX.md, last run log,
list files modified since last run] O --> G[Phase 1: Gather
read fresh entries +
flagged files +
5% rotating sample of older files] G --> E[Phase 2: Edit
SONNET 4.6
consolidate, prune, split, merge
resolve contradictions] E --> L[Phase 3: Link
SONNET 4.6
re-traverse touched files +
their first-degree neighbours
rebuild related: arrays
fix orphaned sources] L --> B[Phase 4: Brief
SONNET 4.6
write tomorrow's morning brief
from active topics + calendar + inbox] B --> D[7am delivery
brief lands as message] style E fill:#3a2818,stroke:#ffb454,color:#ffd595 style L fill:#1c2c1c,stroke:#7bd88f,color:#a3e0b3

Two reasons for the split:

Edit changes the topology. If the edit phase splits running.md into running-history.md + running-phase-9.md, every other file that referenced running now has a stale link. The link phase catches that.
Link is cheaper than edit. Link only needs frontmatter + first paragraph of each file, not full content. Worth a separate pass to keep edit focused on the hard work.

What the dream pass actually processes each night:

Files modified since last run — the fresh entries
Files flagged for review — set to status: needs-review by the hot path when something contradictory came in
5% rotating sample of older files — garbage-collection sweep, the whole library cycles every ~3 weeks

Typical night: 20 active files in the edit phase, ~50 files in the link phase (touched + first-degree neighbours via related:). Not the full library every night.

4. Model assignments

Job	Model	Why
Hot path (you talk to Cosmo)	User's choice via `/opus` or `/sonnet`	Already working, no change.
Bootstrap ingest (one-off)	Opus 4.7	Wide scope, novel synthesis from years of scattered material, runs once. Worth it.
Dream pass — consolidate, prune, brief	Sonnet 4.6	Anthropic's own guidance: "Sonnet as default, Opus for hard problems." Merging files isn't a hard problem. SWE-bench gap is 1.2 points.
Dream pass — link phase	Sonnet 4.6	Could be Haiku 4.5 (classification task), but on Max plan there's no per-token saving. Simpler to keep one model across the whole pass.
Trigger checker (step 8)	Sonnet 4.6	Same reasoning. The boomerang lesson applies: don't hardcode Opus on a job that runs forever.
Interrupt classifier (step 9)	Sonnet 4.6	Decides notify / question / review based on busy signals. Light reasoning over structured input.

Each phase is a single value in the config file. If we discover Sonnet is visibly worse than Opus on the consolidate phase (see next section), we swap one constant. No code change.

5. Shadow mode + the bake-in test

The honest problem: if we only ever run Sonnet, we never see what Opus would have produced. We could ship a slightly worse system for months without noticing.

So for the first 14 nights of dream-pass operation, both models run the consolidate phase:

flowchart TD G[Phase 1: Gather] --> E1[Phase 2: Edit
SONNET 4.6
writes to ~/cosmo-memory/] G --> E2[Phase 2: Edit shadow
OPUS 4.7
writes to ~/cosmo-memory-shadow/] E1 --> L[Phase 3: Link
continues with Sonnet output as live] E2 --> X[Diff written to dashboard
read in morning brief] L --> B[Phase 4: Brief] B --> D[7am delivery] style E1 fill:#3a2818,stroke:#ffb454,color:#ffd595 style E2 fill:#2a1838,stroke:#a78bfa,color:#cbb5fc

After 14 nights, three possible outcomes:

Opus is meaningfully better → swap consolidate to Opus permanently
Sonnet is roughly equivalent → kill the shadow, save the quota, evidence justifies the choice
Inconclusive → extend shadow another N nights

The mechanism is reusable. Same shadow pattern works later for "is Haiku enough for the link phase?" or "is Sonnet 4.7 better than 4.6 once it ships?"

The bake-in itself is the first real test of the new system. When the tasks directory is built (step 5), this becomes:

~/cosmo-memory/tasks/active/dream-pass-shadow-review.md

---
title: Dream pass — Sonnet vs Opus shadow review
status: active
created: 2026-04-25
trigger:
  type: date
  fire_at: 2026-05-09
  mode: question
related: [plans/memory-v2, dream-pass]
---

After 14 nights of shadow-mode comparison:
- Open the diff viewer
- Read 3-5 representative diffs
- Decide: keep Sonnet, switch to Opus, or extend shadow
- Update plans/memory-v2.html with the decision
- Either flip comparison_model: off, or rotate to monthly bake-off mode

The trigger fires Sat 9 May 2026, lands as a mode: question interrupt in the morning brief. If the new system surfaces it correctly, we know the trigger mechanism works. If it drops the ball, we catch it via a backup external /schedule agent set for the same date.

6. Archive policy

Two ways a file can leave the active set. No third option.

status: superseded

Set automatically by the consolidate phase when it splits or merges files. The old file gets superseded_by: <newer-file> in its frontmatter and stays on disk as a pointer. Searchable but doesn't load by default.

status: archived

Set explicitly by the user (/archive <topic>) or by the agent after the user has explicitly said "we're done with X." Never automatic, never time-based.

What we explicitly rejected: time-based auto-archive after N months idle. A reference file like australian-driving-rules.md might never get updated and still be relevant forever. Silent files aren't dead files.

Only active status loads into context by default.

7. Build order

Same nine steps as yesterday. Today added the prep work (commit, branch, archive boomerang) plus more detail on what each step ships.

#	Step	Ships	Status
0a	Commit current main state	Clean working tree, 9 fresh commits	done
0b	Cut `memory-v2` branch	Branch created off clean main	done
0c	Archive boomerang v1	`archive/boomerang-v1/` snapshot + README	done
0d	Plan doc	`plans/memory-v2.html` + audio + Pages deploy	this doc
1	Memory v1	Delete Qdrant code + embeddings + boomerang refs from agent.js/bot.js. Create `~/cosmo-memory/` with INDEX.md, SCHEMA.md, topics/, plans/, tasks/, inbox/. Init git, push to private `choujar/cosmo-memory`. Wire router + extractor in agent.js and bot.js.	next
1.5	Dashboard v0 (local)	Minimal local viewer: list topics, render markdown + frontmatter, filter by tag, sort by updated. Localhost only, no auth. Ships before bootstrap ingest so step 2 isn't blind.	pending
2	Bootstrap ingest	One-off Opus pass over Claude Code memory, all session docs, skill files, every CLAUDE.md, all specs. Synthesises ~40-60 topic files. Watch via dashboard.	pending
3	Dream pass v1	Nightly 3am PM2 process. Five phases (Orient → Gather → Edit → Link → Brief). Catch-up logic. Shadow mode enabled.	pending
4	Claude Code integration	SessionStart hook so Claude Code reads from `~/cosmo-memory/` too — interface parity.	pending
5	Tasks directory	`tasks/active/`, `tasks/blocked/`, `tasks/done/`. First task created: the shadow-review trigger above.	pending
6	Plans directory	`plans/`. First plan that lives there: this doc itself, ported.	pending
7	Inbox + morning brief	`inbox/` for proactive surfaces. 7am brief delivery via Telegram.	pending
8	Triggers	`trigger:` field on task files. Sonnet-powered checker process. Salvages chain linking + ack windows from boomerang archive.	pending
9	Judge + polish	Interrupt classifier (notify / question / review). Dashboard production deploy with Worker basic auth. Slash commands. Edit features in dashboard.	pending

Steps 1-3 are the core. Roughly one solid day of work, possibly less if we don't get distracted.

8. What's still open

One thing parked. Doesn't block the build.

Multi-machine coordination. Old MacBook Pro might come up as an always-on host. Parked until that hardware is live. ~/cosmo-memory/ being a git repo synced to GitHub makes pull-on-wake reasonable, but real coordination (avoiding two dream passes at 3am, lock-file semantics, who-pushes-wins) is a separate design.

Resolved between plan v1 and v2.

Bi-temporal scope → date everything. Six characters per fact, total consistency, no migration debt over years.
Web dashboard timing → from day one (step 1.5), not step 9. Local-only at first, no auth. Production deploy with Worker basic auth comes in step 9 once the data shape is stable.
Repo layout → ~/cosmo-memory/ is its own private repo, sibling to ~/cosmo/. Personal data stays out of the code repo. Production dashboard is read-only against the deployed git copy.

9. What happens next

You read this doc. Listen to it in the car if that's easier — it's the same audio-player setup as yesterday's spec, lock-screen controls and all.

If anything in here looks wrong, push back before I start step 1. Once you say go, I:

Strip Qdrant + embeddings + boomerang references from src/agent.js and src/bot.js
Delete the live boomerang files (already archived)
Create ~/cosmo-memory/ with INDEX.md skeleton, SCHEMA.md (the topic file shape from section 2), empty topics/
Wire the new read path (router) and write path (extractor) in src/agent.js
Hand back for review of the running system before moving to step 2

Then bootstrap ingest. Then dream pass. Then everything else.

The meta point. This is the first plan the new memory system will ever manage. Step 6 (plans directory) ports this exact file into ~/cosmo-memory/plans/memory-v2.md. Step 5 (tasks directory) creates the first task file pointing to the Sat 9 May 2026 shadow review. The system bootstraps itself by overseeing its own birth — that's the working test. If it can't keep track of its own build, it can't keep track of anything else.

Doc lives at plans/memory-v2.html on the memory-v2 branch. Generated Sat 25 Apr 2026.