Cosmo memory v2 — the plan
Decisions locked. Building next.
@anthropic-ai/claude-agent-sdk's query() function. Not the direct @anthropic-ai/sdk. Cosmo has no ANTHROPIC_API_KEY; auth flows through Claude Code session credentials. The pattern in src/agent.js is canonical. Model selection is via the model field in query() options. No exceptions.
1. What we agreed
Eight things locked in this morning. Each was a real choice with a defensible alternative.
| Decision | What we picked | What we rejected |
|---|---|---|
| Read path | router Small Sonnet 4.6 call picks 1–3 topic files per turn. | Inject every topic file every turn, lean on prompt cache. Simpler but token-wasteful at scale. |
| Directory structure | flat One directory, one file per topic. No nesting. | Hierarchy (health/running/phase9). Ages badly. The "where does this go" problem is unsolvable. |
| File metadata | YAML frontmatter + flat tags Five fields: title, tags, status, updated, related. | Plain markdown only (Karpathy's punt) or directory hierarchy as the metadata (Letta). |
| Dream pass shape | two-phase Edit phase first (consolidate, prune, split, merge). Then link phase (re-traverse, fix related:). |
Single-pass file-by-file. Misses transitive connections after a split or merge. |
| Dream pass model | Sonnet 4.6 across all phases. Bootstrap ingest uses Opus 4.7 (one-off). | Opus for the consolidate phase nightly. Anthropic's own guidance says Sonnet is in range, and burning Max-plan Opus quota every night is not free. |
| Confidence test | shadow mode First 14 nights, Opus also runs the consolidate phase, output diffed for review. | Trust Sonnet without comparison. Risks silently shipping worse output for months before noticing. |
| Archive policy | no auto-archive Files leave the active set only via superseded_by or explicit user archive. |
Time-based archive after N months idle. "Silent" doesn't mean "dead" — old reference files stay relevant forever. |
| Bootstrap ingest scope | wide Read existing Claude Code memory, all session docs, skill files, every CLAUDE.md, every spec. One-off Opus pass. | Narrow ingest. Faster but loses richness. The dream pass is designed to prune anyway. |
| Bi-temporal scope | date everything Every fact gets a date when written. (since YYYY-MM) for ongoing, (YYYY-MM to YYYY-MM) for ranges, exact day where known, ~YYYY for fuzzy. SCHEMA.md enforces. |
Opt-in by tag (ages — same as hierarchy in disguise) or LLM-decides-per-fact (drifts across model versions). Both create inconsistent corpus over years. |
| Dashboard + repo layout | dashboard from day one Sibling repo ~/cosmo-memory/ (private GitHub), separate from cosmo code. Local dashboard reads the filesystem from step 1.5. Production deploy with Worker basic auth comes later. |
Dashboard at step 9 (too late — bootstrap ingest would run blind). Memory inside cosmo repo (mixes personal data with code). Cloudflare Access (good but heavier than needed for now). |
2. The topic file shape
Every file in ~/cosmo-memory/topics/ looks the same.
---
title: Running — Phase 9 (comeback 5K)
tags: [health, running, training-plan, current]
status: active
created: 2026-04-12
updated: 2026-04-25
related: [migraine-history, running-history]
sources:
- ~/.claude/projects/.../memory/project_phase9_comeback_5k.md
- .claude/skills/health/current/phase9-context.md
---
# Phase 9 — comeback 5K plan
[content lives here as plain markdown]
Five fields the eventual web dashboard renders directly: title, tags (filter chips), status (badge), updated (sort order), related (nav links). Tags are a flat list — no hierarchy. The dashboard can group by tag if it wants, but the file itself doesn't know about that.
Why frontmatter and not just markdown: yesterday's research showed the prior art is split. Karpathy says "format is up to you." Stevens uses tags but no frontmatter (it's SQLite). Letta uses frontmatter but no tags (it uses directory hierarchy). We're picking the combination that works for our specific need: a future web dashboard wants structured fields to filter and sort by. Plain markdown gives the dashboard nothing.
3. The dream pass — two phases
Runs nightly at 3am. If the Mac was off, runs on next wake (catch-up logic).
or catch-up on wake] --> O[Phase 0: Orient
read INDEX.md, last run log,
list files modified since last run] O --> G[Phase 1: Gather
read fresh entries +
flagged files +
5% rotating sample of older files] G --> E[Phase 2: Edit
SONNET 4.6
consolidate, prune, split, merge
resolve contradictions] E --> L[Phase 3: Link
SONNET 4.6
re-traverse touched files +
their first-degree neighbours
rebuild related: arrays
fix orphaned sources] L --> B[Phase 4: Brief
SONNET 4.6
write tomorrow's morning brief
from active topics + calendar + inbox] B --> D[7am delivery
brief lands as message] style E fill:#3a2818,stroke:#ffb454,color:#ffd595 style L fill:#1c2c1c,stroke:#7bd88f,color:#a3e0b3
Two reasons for the split:
- Edit changes the topology. If the edit phase splits
running.mdintorunning-history.md+running-phase-9.md, every other file that referencedrunningnow has a stale link. The link phase catches that. - Link is cheaper than edit. Link only needs frontmatter + first paragraph of each file, not full content. Worth a separate pass to keep edit focused on the hard work.
What the dream pass actually processes each night:
- Files modified since last run — the fresh entries
- Files flagged for review — set to
status: needs-reviewby the hot path when something contradictory came in - 5% rotating sample of older files — garbage-collection sweep, the whole library cycles every ~3 weeks
Typical night: 20 active files in the edit phase, ~50 files in the link phase (touched + first-degree neighbours via related:). Not the full library every night.
4. Model assignments
| Job | Model | Why |
|---|---|---|
| Hot path (you talk to Cosmo) | User's choice via /opus or /sonnet |
Already working, no change. |
| Bootstrap ingest (one-off) | Opus 4.7 | Wide scope, novel synthesis from years of scattered material, runs once. Worth it. |
| Dream pass — consolidate, prune, brief | Sonnet 4.6 | Anthropic's own guidance: "Sonnet as default, Opus for hard problems." Merging files isn't a hard problem. SWE-bench gap is 1.2 points. |
| Dream pass — link phase | Sonnet 4.6 | Could be Haiku 4.5 (classification task), but on Max plan there's no per-token saving. Simpler to keep one model across the whole pass. |
| Trigger checker (step 8) | Sonnet 4.6 | Same reasoning. The boomerang lesson applies: don't hardcode Opus on a job that runs forever. |
| Interrupt classifier (step 9) | Sonnet 4.6 | Decides notify / question / review based on busy signals. Light reasoning over structured input. |
Each phase is a single value in the config file. If we discover Sonnet is visibly worse than Opus on the consolidate phase (see next section), we swap one constant. No code change.
5. Shadow mode + the bake-in test
The honest problem: if we only ever run Sonnet, we never see what Opus would have produced. We could ship a slightly worse system for months without noticing.
So for the first 14 nights of dream-pass operation, both models run the consolidate phase:
SONNET 4.6
writes to ~/cosmo-memory/] G --> E2[Phase 2: Edit shadow
OPUS 4.7
writes to ~/cosmo-memory-shadow/] E1 --> L[Phase 3: Link
continues with Sonnet output as live] E2 --> X[Diff written to dashboard
read in morning brief] L --> B[Phase 4: Brief] B --> D[7am delivery] style E1 fill:#3a2818,stroke:#ffb454,color:#ffd595 style E2 fill:#2a1838,stroke:#a78bfa,color:#cbb5fc
After 14 nights, three possible outcomes:
- Opus is meaningfully better → swap consolidate to Opus permanently
- Sonnet is roughly equivalent → kill the shadow, save the quota, evidence justifies the choice
- Inconclusive → extend shadow another N nights
The mechanism is reusable. Same shadow pattern works later for "is Haiku enough for the link phase?" or "is Sonnet 4.7 better than 4.6 once it ships?"
~/cosmo-memory/tasks/active/dream-pass-shadow-review.md
---
title: Dream pass — Sonnet vs Opus shadow review
status: active
created: 2026-04-25
trigger:
type: date
fire_at: 2026-05-09
mode: question
related: [plans/memory-v2, dream-pass]
---
After 14 nights of shadow-mode comparison:
- Open the diff viewer
- Read 3-5 representative diffs
- Decide: keep Sonnet, switch to Opus, or extend shadow
- Update plans/memory-v2.html with the decision
- Either flip comparison_model: off, or rotate to monthly bake-off mode
The trigger fires Sat 9 May 2026, lands as a mode: question interrupt in the morning brief. If the new system surfaces it correctly, we know the trigger mechanism works. If it drops the ball, we catch it via a backup external /schedule agent set for the same date.
6. Archive policy
Two ways a file can leave the active set. No third option.
status: superseded
Set automatically by the consolidate phase when it splits or merges files. The old file gets superseded_by: <newer-file> in its frontmatter and stays on disk as a pointer. Searchable but doesn't load by default.
status: archived
Set explicitly by the user (/archive <topic>) or by the agent after the user has explicitly said "we're done with X." Never automatic, never time-based.
What we explicitly rejected: time-based auto-archive after N months idle. A reference file like australian-driving-rules.md might never get updated and still be relevant forever. Silent files aren't dead files.
Only active status loads into context by default.
7. Build order
Same nine steps as yesterday. Today added the prep work (commit, branch, archive boomerang) plus more detail on what each step ships.
| # | Step | Ships | Status |
|---|---|---|---|
| 0a | Commit current main state | Clean working tree, 9 fresh commits | done |
| 0b | Cut memory-v2 branch | Branch created off clean main | done |
| 0c | Archive boomerang v1 | archive/boomerang-v1/ snapshot + README | done |
| 0d | Plan doc | plans/memory-v2.html + audio + Pages deploy | this doc |
| 1 | Memory v1 | Delete Qdrant code + embeddings + boomerang refs from agent.js/bot.js. Create ~/cosmo-memory/ with INDEX.md, SCHEMA.md, topics/, plans/, tasks/, inbox/. Init git, push to private choujar/cosmo-memory. Wire router + extractor in agent.js and bot.js. | next |
| 1.5 | Dashboard v0 (local) | Minimal local viewer: list topics, render markdown + frontmatter, filter by tag, sort by updated. Localhost only, no auth. Ships before bootstrap ingest so step 2 isn't blind. | pending |
| 2 | Bootstrap ingest | One-off Opus pass over Claude Code memory, all session docs, skill files, every CLAUDE.md, all specs. Synthesises ~40-60 topic files. Watch via dashboard. | pending |
| 3 | Dream pass v1 | Nightly 3am PM2 process. Five phases (Orient → Gather → Edit → Link → Brief). Catch-up logic. Shadow mode enabled. | pending |
| 4 | Claude Code integration | SessionStart hook so Claude Code reads from ~/cosmo-memory/ too — interface parity. | pending |
| 5 | Tasks directory | tasks/active/, tasks/blocked/, tasks/done/. First task created: the shadow-review trigger above. | pending |
| 6 | Plans directory | plans/. First plan that lives there: this doc itself, ported. | pending |
| 7 | Inbox + morning brief | inbox/ for proactive surfaces. 7am brief delivery via Telegram. | pending |
| 8 | Triggers | trigger: field on task files. Sonnet-powered checker process. Salvages chain linking + ack windows from boomerang archive. | pending |
| 9 | Judge + polish | Interrupt classifier (notify / question / review). Dashboard production deploy with Worker basic auth. Slash commands. Edit features in dashboard. | pending |
Steps 1-3 are the core. Roughly one solid day of work, possibly less if we don't get distracted.
8. What's still open
One thing parked. Doesn't block the build.
- Multi-machine coordination. Old MacBook Pro might come up as an always-on host. Parked until that hardware is live.
~/cosmo-memory/being a git repo synced to GitHub makes pull-on-wake reasonable, but real coordination (avoiding two dream passes at 3am, lock-file semantics, who-pushes-wins) is a separate design.
- Bi-temporal scope → date everything. Six characters per fact, total consistency, no migration debt over years.
- Web dashboard timing → from day one (step 1.5), not step 9. Local-only at first, no auth. Production deploy with Worker basic auth comes in step 9 once the data shape is stable.
- Repo layout →
~/cosmo-memory/is its own private repo, sibling to~/cosmo/. Personal data stays out of the code repo. Production dashboard is read-only against the deployed git copy.
9. What happens next
You read this doc. Listen to it in the car if that's easier — it's the same audio-player setup as yesterday's spec, lock-screen controls and all.
If anything in here looks wrong, push back before I start step 1. Once you say go, I:
- Strip Qdrant + embeddings + boomerang references from
src/agent.jsandsrc/bot.js - Delete the live boomerang files (already archived)
- Create
~/cosmo-memory/with INDEX.md skeleton, SCHEMA.md (the topic file shape from section 2), emptytopics/ - Wire the new read path (router) and write path (extractor) in
src/agent.js - Hand back for review of the running system before moving to step 2
Then bootstrap ingest. Then dream pass. Then everything else.
~/cosmo-memory/plans/memory-v2.md. Step 5 (tasks directory) creates the first task file pointing to the Sat 9 May 2026 shadow review. The system bootstraps itself by overseeing its own birth — that's the working test. If it can't keep track of its own build, it can't keep track of anything else.