-
Notifications
You must be signed in to change notification settings - Fork 93
Description
Title: Running implementation of cognitive memory architecture on distributed local hardware
Hey Adam — I've been building mycoSwarm, a distributed local AI framework, and your agentic-memory work has been a direct influence on the memory architecture.
The system runs on 5 nodes (RTX 3090, three M710Q mini PCs, a Raspberry Pi 2) — $1,035 total in used hardware, zero cloud dependencies. Here's how your concepts map to what's running in production:
Memory architecture:
- Session-as-RAG: all past conversations indexed into ChromaDB with automatic multi-topic splitting
- Distinct citation labels: [S1] for session memory, [D1] for document RAG — so the model knows the source type
- Lifecycle-tagged facts: preference, project, ephemeral with different retention rules
- Graceful miss: model says "I don't recall" instead of hallucinating when retrieval fails
- Decay scoring planned (Phase 21) for "forgetting as technology"
Retrieval:
- Hybrid BM25 + vector search with Reciprocal Rank Fusion
- LLM re-ranking (small model scores chunks 0-10 before injection)
- RAG eval framework with NDCG@5, MRR, hit@1/5 tracking
- Baseline: 72% hit@1, 100% hit@5, 0.86 NDCG@5
- Fine-tuned embeddings (your QuicKB approach) planned for Phase 23
Interesting finding today: benchmarked RWKV-7 2.9B vs gemma3:4b on an 8GB M710Q over 10 conversational turns. The transformer degraded from 5.7 to 5.2 tok/s then crashed at turn 6 from KV cache exhaustion. RWKV held constant at 4.7 tok/s all 10 turns. For memory-heavy agents on constrained hardware, the no-KV-cache architecture seems like a natural fit.
Repo: https://github.com/msb-msb/mycoSwarm
Install: pip install mycoswarm
Blog: insiderllm.com
Would be curious to hear your thoughts on the memory lifecycle approach — particularly whether you've seen other implementations handling the "when to forget" problem in practice.