From 304c771fd680b8ddf3e34460bcfee5c503b6dda2 Mon Sep 17 00:00:00 2001
From: Thomas Korrison <thomas_korrison@hotmail.com>
Date: Mon, 25 May 2026 16:50:06 +0100
Subject: [PATCH] docs: expand design documentation with new sections on
 storage layer and concurrency

- Added a section on the storage layer, detailing the `Store` trait family, concrete stores, and the rationale behind `StoreMetrics`.
- Updated the concurrency documentation to clarify the `ConcurrentStoreRead` and `ConcurrentStore` trait family, including their sequential and concurrent split.
- Enhanced the design overview to provide a clearer architectural perspective, linking to relevant design documents for better navigation.

These additions improve the comprehensiveness of the documentation, aiding developers in understanding the cache library's structure and concurrency strategies.
---
 docs/design/concurrency.md       |    7 +-
 docs/design/design.md            | 1173 +++++++++++++++++++++---------
 docs/design/metrics.md           |    5 +-
 docs/design/non-goals.md         |   11 +
 docs/design/storage.md           |  447 ++++++++++++
 docs/design/style-guide.md       |  167 ++++-
 docs/design/trait-hierarchy.md   |    7 +-
 docs/design/ttl.md               |  547 ++++++++------
 docs/design/weighted-eviction.md |    3 +
 docs/index.md                    |    1 +
 src/metrics/cell.rs              |   35 +-
 11 files changed, 1851 insertions(+), 552 deletions(-)
 create mode 100644 docs/design/storage.md
diff --git a/docs/design/concurrency.md b/docs/design/concurrency.md
index e418976..84a29ba 100644
--- a/docs/design/concurrency.md
+++ b/docs/design/concurrency.md
@@ -359,8 +359,11 @@ Tracked roughly in priority order:
 - [TTL design](ttl.md) — applied case for `ConcurrentExpiring<C>`
 - [Cache trait hierarchy](trait-hierarchy.md) — read/mutate split and
   object-safety rationale
-- [Stores](../stores/README.md) — `ConcurrentStoreRead` /
-  `ConcurrentStore` trait family
+- [Storage layer](storage.md) — `ConcurrentStoreRead` /
+  `ConcurrentStore` trait family rationale and the
+  sequential/concurrent split
+- [Stores](../stores/README.md) — runtime-behaviour reference for
+  each concrete store
 - [`src/store/traits.rs`](../../src/store/traits.rs) — concurrent
   store traits
 - [`src/traits.rs`](../../src/traits.rs) — `ConcurrentCache` marker
diff --git a/docs/design/design.md b/docs/design/design.md
index dcc01e7..6b0176e 100644
--- a/docs/design/design.md
+++ b/docs/design/design.md
@@ -1,327 +1,841 @@
 # Design Overview
 
-This document collects the design principles that shape `cachekit`. Each
-section pairs a principle with the concrete artifact in the source tree
-that realizes it, so the prose stays grounded in the code rather than
-floating as advice.
+> Status: top-level design overview for cachekit. Indexes the
+> architectural layers and the substantive design decisions, with the
+> original numbered principles preserved as Appendix A. Every
+> subsystem (concurrency, storage, metrics, TTL, …) has its own
+> companion design doc; this doc names what they collectively decide.
+
+For a worked example that applies every concern below to a single
+feature, see the [TTL design doc](ttl.md). For interface conventions,
+the [Rust API Guidelines checklist](https://rust-lang.github.io/api-guidelines/checklist.html)
+is the companion reference; both module-level rustdoc and the design
+docs themselves follow the [doc style guide](style-guide.md).
+
+## Architecture at a glance
+
+```text
++------------------------------------------------------------------+
+|  Integration: CacheBuilder, DynCache, DynExpiringCache           |
++------------------------------------------------------------------+
+|  Capability traits (opt-in):                                     |
+|   RecencyTracking, FrequencyTracking, HistoryTracking,           |
+|   ExpiringCache, EvictingCache, VictimInspectable                |
++------------------------------------------------------------------+
+|  Policy kernel: Cache<K, V>                                      |
+|   18 implemented policies (LRU, LFU, S3-FIFO, ARC, CAR, ...)     |
++------------------------------------------------------------------+
+|  Storage: StoreCore / StoreMut (+ concurrent peers)              |
+|   HashMapStore, SlabStore, HandleStore, WeightStore (sibling)    |
++------------------------------------------------------------------+
+        ^                 ^                 ^                ^
+   +----+----+      +-----+-----+     +-----+-----+    +----+-----+
+   | Metrics |      |    TTL    |     |Concurrency|    | Hashing  |
+   |(2-layer)|      | Expiring  |     |  RwLock + |    |    +     |
+   |         |      | decorator |     |  sharding |    | Sharding |
+   +---------+      +-----------+     +-----------+    +----------+
+             (cross-cutting concerns, mostly feature-gated)
+```
+
+Read bottom-up: storage owns layout and ownership; policies layer
+eviction order on top of that; capability traits expose optional
+signals when the policy has them; the integration layer turns
+runtime configuration into one concrete cache type. Cross-cutting
+concerns (metrics, TTL, concurrency, hashing, sharding) are
+orthogonal to the layer stack — each has its own design doc and
+attaches to the layer where its choices live.
+
+## Layered map
+
+### Storage layer
+
+Stores own keys and values. They expose `StoreCore`/`StoreMut` for
+sequential access and `ConcurrentStoreRead`/`ConcurrentStore` for
+concurrent access, signal capacity refusal with `StoreFull`, and
+ship one always-on counter struct (`StoreMetrics`).
+
+Four concrete stores ship: `HashMapStore` (default), `SlabStore`
+(arena with stable `EntryId` handles), `HandleStore`
+(interner-keyed), and `WeightStore` (byte-aware sibling that
+deliberately diverges from `StoreMut`'s contract — see D14). See
+[Storage layer](storage.md).
+
+### Policy layer
+
+Policies decide eviction order. Every policy implements the
+object-safe `Cache<K, V>` kernel; 18 ship today, organised in the
+catalog at [`docs/policies/`](../policies/README.md). The kernel
+deliberately separates `peek` (`&self`, side-effect-free) from `get`
+(`&mut self`, policy-updating) so concurrent wrappers can take a
+read lock on `peek` paths (see D3). See
+[Cache trait hierarchy](trait-hierarchy.md).
+
+### Capability layer
+
+Capabilities are opt-in extension traits that expose signals which
+*some but not all* policies own: `RecencyTracking`,
+`FrequencyTracking`, `HistoryTracking`, `ExpiringCache`,
+`EvictingCache`, `VictimInspectable`. Each extends `Cache<K, V>` —
+generic code requires the capability bound rather than feature-flag-
+gating methods on the kernel (D2). See
+[Cache trait hierarchy §"Layer 2 — Capability traits"](trait-hierarchy.md#layer-2--capability-traits).
+
+### Integration layer
+
+`CacheBuilder` turns a `CachePolicy` enum + capacity + optional
+defaults (TTL, hasher) into a `DynCache<K, V>` (or
+`DynExpiringCache<K, V>` when TTL is configured). `DynCache`
+dispatches through an internal enum match rather than
+`Box<dyn Cache>` — devirtualising the hot path while keeping the
+user-facing type uniform (D9). See
+[Builder and runtime dispatch](builder-and-dyn-dispatch.md).
+
+### Cross-cutting concerns
 
-For a worked example that applies every principle below to one feature,
-see the [TTL design doc](ttl.md). For interface conventions, the
-[Rust API Guidelines checklist](https://rust-lang.github.io/api-guidelines/checklist.html)
-is the companion reference; module-level documentation follows the
-[doc style guide](style-guide.md).
-
-## 1. Workload First, Policy Second
-
-Cache policy only matters relative to workload.
-
-Identify access patterns:
-- Hot-set traffic: skewed keys, low churn on the hot set, high churn at the tail.
-- Scan-heavy traffic: large working sets, weak temporal locality.
-- Mixed traffic: bursts of hot data over large cold sets.
-
-Measure:
-- Reuse distance / stack distance.
-- Read/write ratio.
-- Temporal vs spatial locality.
-
-Choose policies accordingly:
-- `LRU` / `Clock`: good for temporal locality, vulnerable to scans.
-- `LRU-K` / `2Q` / `SLRU`: better at filtering one-off accesses.
-- `ARC` / `CAR`: adaptive recency/frequency balance without manual tuning.
-- `S3-FIFO` / `Heap-LFU`: strong general-purpose defaults under scans.
-
-All of the above ship today; see [`docs/policies/`](../policies/README.md)
-for the implemented catalog and [`docs/policies/roadmap/`](../policies/roadmap/README.md)
-for planned policies (LIRS, TinyLFU, SIEVE, GDS/GDSF, etc.).
-
-When picking a policy or tuning a cache, design for the workload you
-expect — not the average of all workloads.
-
-## 2. Memory Layout Matters More Than Algorithms
-
-In a cache, memory layout often dominates policy.
-
-Prefer:
-- Contiguous storage (`Vec`, slabs, arenas).
-- Index-based indirection over pointer chasing.
-
-Avoid:
-- Excessive `Box`, `Arc`, linked lists with heap-allocated nodes.
-- `HashMap` lookups in hot paths if avoidable.
-
-Techniques:
-- Store metadata (recency, freq, flags) in tightly packed structs.
-- Separate hot metadata from cold payloads.
-- Use slab allocators for fixed-size entries.
-
-cachekit realizes this through reusable building blocks under
+- [Concurrency](concurrency.md) — `Concurrent*` wrappers, `RwLock`
+  discipline, sharded primitives, `ConcurrentCache` marker.
+- [Metrics](metrics.md) — recorder / snapshot / exporter split,
+  Prometheus exporter, `MetricsCell` soundness contract.
+- [TTL](ttl.md) — `Expiring<C>` decorator, `ExpirationIndex`,
+  `Clock` abstraction, decorator-vs-embedded trade.
+- [Hashing](hashing.md) — `RandomState` at public boundaries,
+  `FxHash` internally, `ShardSelector` for routing.
+- [Sharding](sharding.md) — sharded primitives at the
+  data-structure / store layer; sharded cache policies are roadmap.
+- [Error model](error-model.md) — three-tier panic / `Result` /
+  invariant discipline, four error types.
+- [Benchmarking](benchmarking.md) — benchmark layers, monomorphic
+  policy registry, JSON artifact schema, reproducibility rules.
+- [Serialization](serialization.md) — `serde` scope (snapshots
+  only), reasons cache state is not serialisable today.
+- [Non-goals](non-goals.md) — explicit boundaries on what cachekit
+  does *not* try to be.
+
+## Design decisions
+
+Numbered ADR-style entries. Each entry documents one substantive
+choice that shaped the public or internal surface; the canonical
+companion doc carries the full detail.
+
+### D1. Policy / storage separation
+
+**Context.** Combining "what to evict" and "how entries are laid
+out in memory" into one type couples eviction strategy to storage
+and prevents policy experimentation.
+
+**Decision.** Two layers with explicit interfaces: stores own keys
+and values and expose `StoreCore`/`StoreMut`; policies own eviction
+metadata and consume the store. Capacity refusal is signalled by
+`StoreFull`; the policy decides who to evict and retries.
+
+**Alternatives considered.**
+- One trait that combines lookup and eviction. Rejected: policy
+  experimentation requires substituting one half without the other.
+- Store-driven eviction with policy callbacks. Rejected: forces a
+  callback indirection on every insert.
+
+**Consequences.**
+- A policy runs over `HashMapStore`, `SlabStore`, or a custom store
+  without policy changes.
+- The store layer is its own design space (see
+  [storage.md](storage.md)); `WeightStore` is the precedent for
+  divergence (D14).
+- Tests can drive the store independently of any policy.
+
+**See also.** Appendix §7; [storage.md](storage.md),
+[trait-hierarchy.md](trait-hierarchy.md).
+
+### D2. Object-safe `Cache<K, V>` kernel + opt-in capability traits
+
+**Context.** Some policies expose signals (recency rank, frequency
+count, K-distance history) that others don't. Putting all of them
+on the kernel forces meaningless defaults and breaks object safety.
+
+**Decision.** `Cache<K, V>` carries only what every policy must do
+(`contains` / `len` / `capacity` / `peek` / `get` / `insert` /
+`remove` / `clear`). Optional signals live in extension traits
+(`RecencyTracking`, `FrequencyTracking`, `HistoryTracking`,
+`ExpiringCache`, `EvictingCache`, `VictimInspectable`) which
+policies implement only when the signal exists in their metadata.
+
+**Alternatives considered.**
+- Feature-gated methods on the kernel. Rejected: the gating signal
+  is policy identity, not build configuration.
+- `Option`-returning defaults. Rejected: silently returning `None`
+  on policies that don't support a method is a footgun.
+
+**Consequences.**
+- Generic code like `fn warm<C: FrequencyTracking>(...)`
+  type-checks only on policies that genuinely track frequency.
+- Trait surface grows by one trait per signal, but only the
+  policies that own the signal pay anything.
+- The kernel stays object-safe, so `Box<dyn Cache<K, V>>` works
+  for test harnesses and registries (even though shipped runtime
+  dispatch uses an enum — see D9).
+
+**See also.** Appendix §7, §13;
+[trait-hierarchy.md](trait-hierarchy.md).
+
+### D3. `peek` (side-effect-free) vs `get` (policy-updating) split
+
+**Context.** A read that updates LRU recency or LFU frequency
+cannot be served from a shared read lock — it mutates policy state.
+A single read method forces every lookup through a write lock.
+
+**Decision.** Two distinct read methods on `Cache<K, V>`:
+- `peek(&self, &K) -> Option<&V>` — honest read, no policy mutation.
+- `get(&mut self, &K) -> Option<&V>` — recorded read; LRU moves the
+  entry to MRU, Clock sets the reference bit, etc.
+
+**Alternatives considered.**
+- One `get(&self, …)` with interior mutability. Rejected: forces
+  the concurrent wrapper to serialise every read through one lock
+  shape regardless of intent.
+
+**Consequences.**
+- Concurrent wrappers take `RwLock::read` on `peek` / `contains` /
+  `len` / `capacity`; `RwLock::write` on `get` / `insert` /
+  `remove` / `clear`. Read-heavy workloads scale across cores.
+- Callers must know which one they want. `get` on a read-heavy
+  workload silently kills scalability; documented at every call
+  site.
+
+**See also.** Appendix §3;
+[trait-hierarchy.md §"peek vs get"](trait-hierarchy.md#peek-vs-get--the-readmutate-split),
+[concurrency.md](concurrency.md).
+
+### D4. Sequential `&V` + concurrent owned values (two trait families)
+
+**Context.** Borrowed references cannot outlive the lock guard
+they were extracted from. A concurrent cache cannot expose
+`Cache::get`'s `Option<&V>` without holding the lock across the
+borrow, which serialises readers.
+
+**Decision.** Two parallel trait families. Sequential
+(`Cache<K, V>`, `StoreCore`, `StoreMut`) returns `&V` / owned `V`.
+Concurrent stores (`ConcurrentStoreRead`, `ConcurrentStore`) return
+`Arc<V>` and take `&self`. Concurrent cache wrappers also take
+`&self`, but their concrete APIs may return `Arc<V>` (LRU) or cloned
+`V` (FIFO, S3-FIFO), depending on the underlying policy storage.
+Concurrent cache wrappers do **not** implement `Cache<K, V>`; they
+expose their own concrete API.
+
+**Alternatives considered.**
+- Unified `Arc<V>`-only traits. Rejected: forces an `Arc<V>`
+  round-trip on every sequential lookup.
+- Hand out lock guards. Rejected: ergonomically terrible and forces
+  callers to manage the lock.
+
+**Consequences.**
+- Concurrent store hits pay one atomic refcount bump. Concurrent
+  cache wrappers that return cloned `V` instead pay the value's
+  clone cost.
+- Generic code that wants "any thread-safe cache" bounds on
+  `ConcurrentCache + Send + Sync`, not `Cache<K, V>`.
+- `DynCache` (sequential) and the `Concurrent*` wrappers
+  (concurrent) are separate dispatch paths.
+
+**See also.** Appendix §3;
+[concurrency.md](concurrency.md),
+[storage.md §"Layer 2"](storage.md#layer-2--concurrentstoreread--concurrentstore).
+
+### D5. `parking_lot::RwLock` for concurrent wrappers
+
+**Context.** The choice of lock primitive shapes uncontended cost,
+fairness, and writer-starvation behaviour.
+
+**Decision.** Every `Concurrent*` wrapper uses
+`parking_lot::RwLock`. Gated behind the `concurrency` Cargo feature.
+
+**Alternatives considered.**
+- `std::sync::RwLock`. Rejected for now: writer-starvation hazard
+  on some platforms; revisit if `parking_lot` becomes a build
+  burden.
+- `Mutex` (any variant). Rejected: serialises readers that only
+  consult `peek` / `contains` / `len`, defeating D3's split.
+- Lock-free / RCU. Rejected as current design; covered in
+  [non-goals.md §"Not Lock-Free"](non-goals.md#not-lock-free).
+
+**Consequences.**
+- `peek` paths scale linearly with cores on read-heavy workloads.
+- `get` paths serialise through the write lock — fundamental, not
+  fixable in the wrapper.
+- `parking_lot` doesn't poison on panic; relevant only under
+  `panic = "unwind"` (the crate's release default is `abort`).
+
+**See also.** Appendix §3;
+[concurrency.md §"Lock primitive choice"](concurrency.md#lock-primitive-choice).
+
+### D6. Slab / arena / intrusive layout over pointer chasing
+
+**Context.** A cache implemented over `Box`-allocated nodes with
+linked-list pointers pays cache misses on every traversal step,
+fights the borrow checker on every mutation, and fragments memory.
+
+**Decision.** Build policies over reusable primitives in
+[`src/ds/`](../../src/ds): `SlotArena` for stable `Handle`-indexed
+entries, `IntrusiveList` for recency lists threading through arena
+slots, `ClockRing` for contiguous reference-bit storage,
+`FrequencyBuckets` for LFU.
+
+**Alternatives considered.**
+- `Box<Node>` linked lists. Rejected: every traversal step is a
+  cache miss; `Box` allocations dominate hot paths.
+- `Rc` / `Arc<Node>`. Rejected: refcount overhead in the hot path.
+
+**Consequences.**
+- Every policy that needs ordering threads its metadata through
+  one of the arena / intrusive primitives. Adding a new policy
+  with novel metadata typically means adding a new primitive to
+  `src/ds/` rather than rolling its own.
+- Stable handles enable O(1) eviction (D8).
+
+**See also.** Appendix §2;
+[`src/ds/`](../../src/ds),
+[`docs/policy-ds/`](../policy-ds/README.md).
+
+### D7. No per-operation allocation in hot paths
+
+**Context.** Allocation on `get` / `insert` dominates flamegraphs
+and makes tail latency unpredictable.
+
+**Decision.** Pre-size pools, slabs, and intrusive lists at
+construction time. Policy and store hot paths avoid `Box::new` and
+`Vec::push`-that-grows; allocation is acceptable on `new` /
+`with_capacity` paths and caller-driven `clear` / replace paths.
+The integration layer has narrow exceptions where a policy stores
+shared values internally: `DynCache` wraps inserted values in
+`Arc<V>` for the LRU, LFU, and Heap-LFU variants.
+
+**Alternatives considered.**
+- Per-thread arenas. Roadmap; the single-arena story suffices for
+  the current policies and avoids cross-thread arena coordination.
+- "Allow allocation, document the cost." Rejected: defeats the
+  performance contract.
+
+**Consequences.**
+- Insert into a full store returns `StoreFull` rather than
+  triggering a `Vec` grow; cache policies handle that by evicting
+  according to policy and retrying the store insert.
+- Benchmarks must pre-allocate values; see
+  [benchmarking.md §"Value Construction Discipline"](benchmarking.md#value-construction-discipline).
+- The `WeightStore` weight-function contract requires
+  `Fn(&V) -> usize` to be O(1); a traversal-based weight function
+  silently regresses insert latency.
+
+**See also.** Appendix §4;
+[`src/ds/slot_arena.rs`](../../src/ds/slot_arena.rs),
+[`src/store/slab.rs`](../../src/store/slab.rs).
+
+### D8. Predictable eviction via direct handles / indices
+
+**Context.** Eviction is the slow path on a full cache. A policy
+that scans to find a victim trades constant-time inserts for
+linear-time evictions, which dominates tail latency.
+
+**Decision.** Policies targeting interactive hot paths maintain
+direct indices or `Handle`s (intrusive list head / tail, `ClockRing`
+hand, lazy-heap root) to the eviction candidate. Eviction cost should
+be comparable to lookup cost, not orders of magnitude higher. `NruCache`
+is the documented exception: it is intentionally simple and can scan
+O(n) on eviction; callers that need O(1) eviction should use `Clock`,
+`LRU`, or another direct-victim policy.
+
+**Alternatives considered.**
+- Scan-on-demand. Rejected for any policy targeting interactive
+  workloads.
+- Lazy O(log n) priority queues are acceptable when amortised cost
+  dominates — `LazyMinHeap` (Heap-LFU, TTL) is the precedent.
+
+**Consequences.**
+- `EvictingCache::evict_one` is `#[must_use]`. Most shipped policies
+  are O(1) or O(log n) amortised; `NruCache` is O(n) worst case by
+  design.
+- Lazy structures must bound staleness:
+  `LazyMinHeap::with_auto_rebuild` prevents unbounded tombstone
+  growth.
+
+**See also.** Appendix §5;
+[`src/store/handle.rs`](../../src/store/handle.rs),
+[`src/ds/lazy_heap.rs`](../../src/ds/lazy_heap.rs),
+[`src/traits.rs`](../../src/traits.rs) (`EvictingCache`).
+
+### D9. Enum dispatch (`DynCache`) over `Box<dyn Cache>` for runtime selection
+
+**Context.** Users want to pick a policy at runtime from
+configuration without writing one call site per policy.
+`Cache<K, V>` is object-safe, but `Box<dyn Cache<K, V>>` pays a
+vtable indirection on every method call.
+
+**Decision.** `CacheBuilder` returns `DynCache<K, V>`, a wrapper
+around a closed `CacheInner<K, V>` enum with one variant per
+builder-wired policy. Method calls dispatch through `match`, which
+the optimiser devirtualises when the variant is invariant across an
+inner loop. CAR is implemented as `CarCore` but is not yet exposed
+through `CachePolicy` / `DynCache`.
+
+**Alternatives considered.**
+- `Box<dyn Cache<K, V>>`. Rejected for the runtime dispatcher;
+  still available to users who want open polymorphism.
+- One concrete `LruCache`-like default. Rejected: configurability
+  is a goal.
+
+**Consequences.**
+- Adding a new builder-wired policy is a six-arm edit (see
+  [builder-and-dyn-dispatch.md §"Adding a new policy"](builder-and-dyn-dispatch.md#adding-a-new-policy)).
+- `DynCache` requires the *union* of every variant's trait bounds
+  (`K: Copy + Eq + Hash + Ord`, `V: Clone + Debug`).
+- A small `Arc<V>` round-trip is paid on the LRU / LFU / Heap-LFU
+  variants that store `Arc<V>` internally.
+
+**See also.** Appendix §8, §13;
+[builder-and-dyn-dispatch.md](builder-and-dyn-dispatch.md).
+
+### D10. Per-policy feature flags + curated default set
+
+**Context.** Users who only need one or two policies shouldn't pay
+compile time and binary size for the other 16.
+
+**Decision.** Every policy is behind its own `policy-*` Cargo
+feature. `policy-all` enables every policy. The default feature
+set is a curated subset (`policy-s3-fifo`, `policy-lru`,
+`policy-fast-lru`, `policy-lru-k`, `policy-clock`) chosen to cover
+the most-recommended workloads. Optional capabilities (`metrics`,
+`concurrency`, `serde`, `ttl`) are gated the same way.
+
+**Alternatives considered.**
+- One mega-feature. Rejected: penalises minimum-surface users.
+- No defaults, force every consumer to enumerate. Rejected:
+  ergonomic regression for the common case.
+
+**Consequences.**
+- A user with `default-features = false, features = ["policy-lru"]`
+  gets one inner variant and no other policy code.
+- One sharp edge: `policy-fast-lru` is in the default set, and
+  `FastLru` is `!Send + !Sync` (uses `NonNull<Node>`). With the
+  default features, `DynCache<K, V>` is also `!Send + !Sync`.
+  Builds that need a sendable `DynCache` should disable
+  `policy-fast-lru` or use a [`Concurrent*` wrapper](concurrency.md)
+  directly.
+
+**See also.** Appendix §13;
+[`Cargo.toml`](../../Cargo.toml),
+[builder-and-dyn-dispatch.md §"`Send + Sync` is conditional"](builder-and-dyn-dispatch.md#send--sync-is-conditional).
+
+### D11. Two-layer metrics
+
+**Context.** Universal counters (hit rate, eviction count) are
+part of the store trait surface and provide the observability
+baseline every implementation can report. Per-policy detailed
+metrics (Clock hand advances, ARC ghost hits, LRU recency-rank
+reads) are expensive to plumb and unnecessary for many consumers.
+
+**Decision.** Two parallel metrics surfaces:
+- `StoreMetrics` (seven counters: `hits`, `misses`, `inserts`,
+  `updates`, `removes`, `evictions`, `expirations`), always on,
+  in [`src/store/traits.rs`](../../src/store/traits.rs).
+- Feature-gated policy-layer hierarchy (per-policy `*Metrics`
+  recorders, `*Snapshot` value types, Prometheus exporter) behind
+  the `metrics` Cargo feature.
+
+**Alternatives considered.**
+- All metrics behind one feature flag. Rejected: the store-layer
+  counters are part of the universal trait contract.
+- No counters at all. Rejected: cannot tune what you do not
+  measure.
+
+**Consequences.**
+- The `metrics` feature opts into per-policy detail and the
+  exporter; turning it off does not strip the store-layer
+  baseline.
+- `MetricsCell` (the `&self` interior-mutability counter) carries
+  a narrow soundness contract: increments must happen under
+  *exclusive* synchronisation, not under a shared `RwLock::read`.
+
+**See also.** Appendix §6; [metrics.md](metrics.md).
+
+### D12. Three-tier error model
+
+**Context.** Different failures need different responses. A
+programmer who passes `k = 0` to LRU-K should learn at the call
+site, not handle a `Result`. A user-supplied config file with
+`small_ratio = 2.0` should be routed through a fallible constructor
+such as `S3FifoCache::try_with_ratios`. The current
+`CacheBuilder::build` path treats `CachePolicy` values as already
+validated program input and panics on invalid parameters; a future
+`try_build` would be the tier-2 surface for external configuration.
+An internal invariant violation should be diagnosable but not fatal
+in test runs.
+
+**Decision.** Three tiers:
+1. **Programming error** → panic (`assert!`, `panic!`,
+   `debug_assert!`).
+2. **User-supplied input / expected resource failure** →
+   `Result<_, ErrorType>`, using narrow error types such as
+   `ConfigError`, `StoreFull`, `LazyMinHeapError`, or
+   `std::collections::TryReserveError` passthrough.
+3. **Invariant violation** → `check_invariants` methods returning
+   `Result<(), InvariantError>` under `debug_assertions` / `test`.
+
+**Alternatives considered.**
+- One `CachekitError` enum. Rejected: each surface has different
+  recovery semantics, and downstream code rarely wants the union.
+- Panic on every failure. Rejected for tier-2 surfaces that take
+  user-supplied input.
+
+**Consequences.**
+- Public constructors that accept user-tunable knobs should come in
+  pairs where that surface is exposed to external configuration:
+  panicking `new` / `build` and fallible `try_*` variants.
+- `panic = "abort"` is the release default; programming errors
+  terminate the process rather than unwinding.
+
+**See also.** Appendix §12; [error-model.md](error-model.md).
+
+### D13. TTL as decorator (`Expiring<C>`) rather than per-policy embedded
+
+**Context.** TTL is orthogonal to eviction policy. Embedding
+`expires_at` into every policy's node would add 8 bytes per entry
+to every cache regardless of whether the user wants TTL.
+
+**Decision.** Phase 1 ships TTL as a generic decorator
+`Expiring<C, K, V, T>` that wraps any `C: Cache<K, V>` with a
+shared `ExpirationIndex` (lazy min-heap of deadlines). The builder
+returns `DynExpiringCache<K, V>` when `.with_default_ttl(...)` is
+configured; otherwise it returns the plain `DynCache<K, V>`.
+Phase 2 would profile and selectively embed `expires_at` into hot
+policies (LRU, S3-FIFO) only if benchmarks justify it.
+
+**Alternatives considered.**
+- Per-policy embedded `expires_at`. Deferred: per-entry overhead
+  is unwanted in non-TTL builds.
+- Storage-level TTL. Rejected: policies still hold stale metadata
+  for now-expired keys.
+
+**Consequences.**
+- Zero churn on the 18 existing policies.
+- Decorator pays one extra hash probe per read; benchmark-gated
+  for the Phase 2 embedded variants.
+- `DynExpiringCache` is structurally distinct from `DynCache` so
+  `Expiring<Expiring<…>>` is unrepresentable — type-level
+  prevention of the "two clocks, two indexes" footgun.
+
+**See also.** [ttl.md](ttl.md).
+
+### D14. `WeightStore` as sibling of `StoreCore` / `StoreMut`, not subtype
+
+**Context.** Byte-budgeted caches (images, blobs, variable-size
+values) need a second budget alongside entry count. Updates can
+legitimately fail when a larger replacement value exceeds the
+budget, which `StoreMut::try_insert`'s "updates always succeed"
+contract cannot express.
+
+**Decision.** `WeightStore<K, V, F>` is a sibling of the store
+trait family, not a subtype. It returns `Arc<V>` (even
+single-threaded) and enforces a dual entry-count + weight cap.
+`ConcurrentWeightStore` *does* implement `ConcurrentStoreRead` /
+`ConcurrentStore` because those traits already use `Arc<V>`
+returns.
+
+**Alternatives considered.**
+- Fold weight into `StoreMut`. Rejected: forces every store to
+  carry a weight slot and a weight function.
+- A separate `WeightCache` policy. Roadmap (GDS / GDSF); today
+  `WeightStore` is the substrate and no policy in the tree
+  consumes it.
+
+**Consequences.**
+- Code generic over `StoreMut` cannot accept a `WeightStore`
+  without adaptation. Documented sharp edge.
+- `WeightStore` pre-stages GDS / GDSF by providing the
+  per-entry-size half of cost / size eviction.
+
+**See also.** [weighted-eviction.md](weighted-eviction.md).
+
+### D15. Mixed hasher defaults (`RandomState` at public boundaries, `FxHash` internally)
+
+**Context.** A single hasher choice forces a single trade-off:
+HashDoS-resistance on public APIs (slower) or speed on internal
+hot paths (vulnerable when keys are user-controlled).
+
+**Decision.** Public stores (`HashMapStore`, `ClockRing`) default
+to `RandomState`. Internal policy maps and `WeightStore` use
+`FxHashMap`. Shard routing (`ShardSelector`) uses keyed
+SipHash-1-3. Each public constructor that takes a non-randomised
+hasher requires explicit caller opt-in (`with_hasher`,
+`KeysAreTrusted`).
+
+**Alternatives considered.**
+- `RandomState` everywhere. Rejected: leaves performance on the
+  table for internal trusted-key paths.
+- `FxHash` everywhere. Rejected: HashDoS exposure on public APIs.
+
+**Consequences.**
+- `WeightStore`'s `FxHashMap` is a documented sharp edge — its
+  target use case (variable-size values keyed by request paths)
+  often has user-derived keys.
+- Sharded routing resists shard-pinning attacks because the
+  routing hash is keyed.
+
+**See also.** [hashing.md](hashing.md).
+
+## Summary
+
+The decisions above cluster around five concerns that consistently
+dominate benchmark and review feedback. The themes are not novel;
+the value in stating them here is naming where each was paid for
+in concrete code.
+
+- **Memory layout** — contiguous storage, stable handles, intrusive
+  lists, contiguous metadata rings (D6). Layout is chosen before
+  any policy decision.
+- **Allocation discipline** — pre-sized pools, no `Box::new` or
+  grow-on-demand `Vec` operations in policy / store hot paths, and
+  `StoreFull` over silent store growth (D7).
+- **Contention control** — `peek` vs `get` separation (D3) and
+  `parking_lot::RwLock` discipline (D5) so read-heavy workloads
+  scale; sharded primitives where the data structure permits.
+- **Eviction predictability** — O(1) or amortised O(log n) victims
+  via direct handles or lazy heaps for hot-path policies (D8);
+  documented exceptions like NRU are explicit, and lazy cleanup is
+  bounded.
+- **Workload realism** — scan-resistant policies in the catalog
+  (Appendix §9), Zipfian / scan / mixed benchmarks (Appendix §10),
+  and policy / storage separation (D1) so the harness can swap one
+  half without disturbing the other.
+
+The trade against ergonomic Rust idioms — enum dispatch over
+`Box<dyn Cache>` (D9), arena handles over `Box<Node>` (D6), and owned
+values at concurrent boundaries (D4) — is deliberate. The decisions
+above name where that cost was paid and where it was refused.
+
+## Appendix A: Principles
+
+The original 13 principles that shape cachekit. The decisions
+above are the implementation of these principles as concrete
+choices; this appendix preserves the principle statements and
+their stable section numbers so sibling design docs can cite
+`design.md §N`.
+
+### 1. Workload First, Policy Second
+
+Cache policy only matters relative to workload. Identify access
+patterns (hot-set, scan-heavy, mixed), measure reuse distance and
+read/write ratio, and choose accordingly: `LRU` / `Clock` for
+temporal locality, `LRU-K` / `2Q` / `SLRU` for scan filtering,
+`ARC` / `CAR` for adaptive recency / frequency balance, `S3-FIFO` /
+`Heap-LFU` for strong general-purpose defaults under scans. All 18
+policies ship as concrete types (CAR is the one not yet wired into
+`DynCache` — see the
+[CAR builder gap](builder-and-dyn-dispatch.md#car-builder-gap)).
+See [`docs/policies/`](../policies/README.md) for the implemented
+catalog and [`docs/policies/roadmap/`](../policies/roadmap/README.md)
+for planned policies (LIRS, TinyLFU, SIEVE, GDS / GDSF, …). Design
+for the workload you expect — not the average of all workloads.
+
+*Realised by:* the policy catalog itself; see also
+[choosing-a-policy.md](../guides/choosing-a-policy.md).
+
+### 2. Memory Layout Matters More Than Algorithms
+
+Memory layout often dominates policy. Prefer contiguous storage
+(`Vec`, slabs, arenas) and index-based indirection over pointer
+chasing; avoid excessive `Box`, `Arc`, and linked lists with
+heap-allocated nodes; store metadata in tightly packed structs;
+separate hot metadata from cold payloads. Cachekit realises this
+through reusable building blocks under
 [`src/ds/`](../../src/ds): [`SlotArena`](../../src/ds/slot_arena.rs)
-hands out stable `Handle`s backed by a `Vec`, [`IntrusiveList`](../../src/ds/intrusive_list.rs)
-threads recency lists through those slots without per-node allocation,
-and [`ClockRing`](../../src/ds/clock_ring.rs) keeps Clock-style state in
-a single contiguous array. See [`docs/policy-ds/`](../policy-ds/README.md)
-for the full primitive catalog.
-
-Cache misses caused by your own data structure are as bad as upstream misses.
-
-## 3. Concurrency Strategy Is Core Design, Not a Wrapper
-
-Locking strategy shapes everything.
-
-Options:
-- Global lock: simple, often fast enough for small cores, dies under high contention.
-- Sharded caches: hash key → shard, each shard independently locked.
-- Lock-free or mostly-lock-free: hard in Rust, only worth it if contention dominates.
-
-cachekit ships the first option today via the `concurrency` feature:
-`Concurrent*` wrappers (e.g. `ConcurrentLruCache`, `ConcurrentSlotArena`,
-`ConcurrentClockRing`) place a `parking_lot::RwLock` around the
-single-threaded core. The wrappers deliberately do **not** implement
-`Cache<K, V>` directly when that would force returning `&V` across a
-lock boundary — they expose `Option<Arc<V>>` style APIs instead. See
-[`src/policy/lru.rs`](../../src/policy/lru.rs),
-[`src/ds/slot_arena.rs`](../../src/ds/slot_arena.rs), and
-[`src/ds/clock_ring.rs`](../../src/ds/clock_ring.rs).
-
-Rust-specific notes:
-- For `RwLock`, prefer `parking_lot` for fairness control and lower
-  uncontended overhead. For `Mutex`, the futex-based `std::sync::Mutex`
-  on Rust 1.85+ is competitive on Linux/macOS; `parking_lot::Mutex`
-  still wins on raw uncontended speed and offers nicer guard ergonomics.
-- Avoid `Arc<Mutex<…>>` in hot paths.
-
-Future directions worth exploring but **not currently implemented**:
-sharded caches (hash key → shard, per-shard lock), per-thread caches with
-periodic merge, and RCU-style read paths for read-heavy workloads.
-
-## 4. Avoid Per-Operation Allocation
-
-Allocations kill throughput.
-
-Pre-allocate:
-- Entry pools — see [`SlotArena`](../../src/ds/slot_arena.rs) and the
-  free-list discipline in [`src/store/slab.rs`](../../src/store/slab.rs).
-- Node arrays — intrusive lists thread through arena slots rather than
-  allocating per-node (see [`src/ds/intrusive_list.rs`](../../src/ds/intrusive_list.rs)).
-
-Reuse:
-- Free lists (slab-backed).
-- Slabs sized once at construction time via `CacheBuilder::new(capacity)`.
-
-Use:
-- `Vec` with explicit capacity management.
-- `rustc-hash` (via the `rustc-hash` dep) for cheap key hashing in
-  hot-path lookups.
-
-Avoid:
-- Creating new `Arc`, `String`, `Vec` per lookup.
-- Hidden clones of `K` on the eviction path.
-
-If `malloc` shows up in your flamegraph, your cache is already slow.
-
-## 5. Eviction Must Be Predictable and Cheap
-
-Eviction is the critical slow path.
-
-O(1) eviction is the goal.
-
-Avoid unbounded tree walks or scans in eviction paths.
-
-Maintain:
-- Direct indices / `Handle`s to eviction candidates (see
-  [`src/store/handle.rs`](../../src/store/handle.rs) and the
-  [`Cache`](../../src/store/traits.rs) trait).
-- Eviction lists or clock hands (intrusive list head, `ClockRing` hand).
-- Lazy heaps where amortized O(log n) is acceptable
-  ([`LazyMinHeap`](../../src/ds/lazy_heap.rs); used by Heap-LFU and TTL).
-
-Be careful with:
-- Background eviction threads (synchronization overhead).
-- Lazy cleanup that grows unbounded; bound it with rebuild thresholds
-  (e.g. `LazyMinHeap::with_auto_rebuild`).
-
-Eviction cost must be comparable to lookup cost, not orders of magnitude higher.
-
-## 6. Metrics Are Not Optional
-
-You cannot tune what you do not measure.
-
-Track at least:
-- Hit / miss rate.
-- Eviction count and reason (capacity vs. expiration).
-- Insert/update rate.
-
-cachekit exposes these through [`StoreMetrics`](../../src/store/traits.rs)
-and per-policy metric structs (e.g. `LruMetrics`), gated behind the
-`metrics` feature so non-instrumented builds pay nothing. The
-`expirations` counter on `Expiring<C>` follows the same pattern (see
-[`src/policy/expiring.rs`](../../src/policy/expiring.rs)).
-
-Roadmap counters:
-- Scan pollution rate.
-- Lock contention or wait time.
-
-Expose:
-- Lightweight counters in the hot path.
-- Optional detailed metrics behind feature flags.
-
-Metrics should guide design decisions, not justify them afterward.
-
-## 7. Separate Policy From Storage
-
-Design in layers:
-- Storage layer: how entries live in memory, allocation, layout,
-  indexing — [`src/store/`](../../src/store).
-- Policy layer: LRU, FIFO, LFU, LRU-K, 2Q, ARC, CAR, Clock, Clock-PRO,
-  S3-FIFO, … — manipulates metadata and ordering only
-  ([`src/policy/`](../../src/policy)).
-- Capability layer: opt-in extension traits ([`RecencyTracking`](../../src/traits.rs),
-  `FrequencyTracking`, `HistoryTracking`, `ExpiringCache`) that policies
-  implement when the underlying signal exists. This is how `Expiring<C>`
-  composes over any policy without touching policy code.
-- Integration layer: ties application objects, payloads, or IDs into
-  cache entries via [`CacheBuilder`](../../src/builder.rs) and the
-  `DynCache` runtime dispatcher.
-
-Related docs:
-- [Policy overview](../policies/README.md)
-- [Policy roadmap](../policies/roadmap/README.md)
-- [Policy data structures](../policy-ds/README.md)
-- [Read-only traits](../guides/read-only-traits.md)
-
-This makes:
-- Benchmarking easier.
-- Policy experimentation cheap.
-- Reasoning about performance clearer.
-
-## 8. Beware of "Nice" Rust APIs in Hot Paths
-
-Ergonomics often cost performance.
-
-Avoid in critical loops:
-- Heavy generics causing code bloat across many monomorphizations.
-- Trait objects for hot dispatch.
-- Closures capturing state.
-- Iterator chains where a plain `for` loop would do.
-
-Prefer:
-- Explicit loops.
-- Concrete types and monomorphized fast paths.
-- Enum dispatch over `Box<dyn Trait>` when polymorphism is needed at the
-  edges — this is exactly the trade `DynCache` makes (see §13).
-
-You can wrap fast internals in nice APIs at the edges.
-
-## 9. Scans Are the Enemy of Caches
-
-In scan-heavy workloads:
-
-Large sequential reads destroy LRU-style caches.
-
-Solutions:
-- Scan-resistant policies: `LRU-K`, `2Q`, `SLRU`, `ARC`, `CAR`,
-  `Clock-PRO`, `S3-FIFO`, `Heap-LFU` — all implemented today.
-- Explicit "scan mode" hints from the caller or workload layer.
-- Bypass cache for known one-shot reads.
-
-If you ignore scans, your cache will look great in microbenchmarks and
-terrible in production.
-
-## 10. Benchmark Like a System, Not a Library
-
-Do not rely on uniform-random key benchmarks.
-
-Use:
-- Zipfian distributions.
-- Mixed read/write workloads.
-- Scan + point lookup mixtures.
-- Time-varying hot sets.
-
-Measure:
-- Throughput.
-- Tail latency.
-- Memory overhead.
-- Eviction cost.
-
-cachekit's benchmark harness covers these dimensions; see
-[`docs/benchmarks/workloads.md`](../benchmarks/workloads.md) and the
-runners under [`benches/`](../../benches).
-
-A cache that is 5 % faster on uniform-random keys but 50 % worse under
-scans is a bad cache.
-
-## 11. Rust Hot-Path Hazards Beyond Allocation
-
-`Arc` is expensive in hot paths; minimize it and lift `Arc::clone` out
-of inner loops.
-
-The borrow checker can push you toward indirection — fight it with:
-- Index-based access (`Handle`s, slot indices) instead of `&mut` chains.
-- Interior mutability only where unavoidable; prefer `Cell<T>` over
-  `RefCell<T>` when `T: Copy`, and atomics when the value lives behind
-  a shared reference.
-
-Beware of:
-- Hidden clones, particularly of keys on the eviction path.
-- Trait object dispatch on read/insert.
-- Over-generic designs whose monomorphization cost dwarfs their benefit.
-
-Rust can match C on hot paths, but only when systems-level discipline
+hands out stable `Handle`s, [`IntrusiveList`](../../src/ds/intrusive_list.rs)
+threads recency lists through arena slots without per-node
+allocation, and [`ClockRing`](../../src/ds/clock_ring.rs) keeps
+Clock-style state in a single contiguous array. See
+[`docs/policy-ds/`](../policy-ds/README.md) for the full primitive
+catalog. Cache misses caused by your own data structure are as bad
+as upstream misses.
+
+*Realised by:* D6 (slab / arena / intrusive layout).
+
+### 3. Concurrency Strategy Is Core Design, Not a Wrapper
+
+Locking strategy shapes everything. Options: global lock (simple,
+dies under contention), sharded caches (per-shard lock), and
+lock-free (hard in Rust, only worth it if contention dominates).
+Cachekit ships the global-lock option as `Concurrent*` wrappers
+(`parking_lot::RwLock` around the single-threaded core, gated by
+the `concurrency` feature) and partial sharding at the
+data-structure / store layer (`ShardedHashMapStore`,
+`ShardedSlotArena`, `ShardedFrequencyBuckets`). A generic
+`Sharded<C: Cache<K, V>>` wrapping any policy is not yet shipped;
+RCU-style read paths and per-thread caches with periodic merge are
+roadmap. Avoid `Arc<Mutex<…>>` in hot paths.
+
+*Realised by:* D4 (`&V` vs `Arc<V>` trait families), D5
+(`parking_lot::RwLock`).
+
+### 4. Avoid Per-Operation Allocation
+
+Allocations kill throughput. Pre-allocate entry pools (see
+[`SlotArena`](../../src/ds/slot_arena.rs) and the free-list
+discipline in [`src/store/slab.rs`](../../src/store/slab.rs)) and
+node arrays (intrusive lists thread through arena slots rather
+than allocating per-node). Reuse free lists; size slabs once at
+construction. Use `Vec` with explicit capacity and `rustc-hash`'s
+`FxHashMap` for cheap key hashing in hot-path lookups (see
+[hashing.md](hashing.md) for the threat model). Avoid creating
+new `Arc` / `String` / `Vec` per lookup and hidden clones of `K`
+on the eviction path. If `malloc` shows up in your flamegraph,
+your cache is already slow.
+
+*Realised by:* D7 (no per-operation allocation).
+
+### 5. Eviction Must Be Predictable and Cheap
+
+Eviction is the critical slow path; O(1) is the goal for hot-path
+policies. Maintain direct indices / `Handle`s to eviction candidates
+([`src/store/handle.rs`](../../src/store/handle.rs); the
+[`EvictingCache`](../../src/traits.rs) capability trait carries
+`evict_one`), eviction lists or clock hands (intrusive list head,
+`ClockRing` hand), and lazy heaps where amortised O(log n) is
+acceptable ([`LazyMinHeap`](../../src/ds/lazy_heap.rs); used by
+Heap-LFU and TTL). `NruCache` deliberately trades that guarantee for
+simplicity and has O(n) worst-case eviction. Be careful with
+background eviction threads and unbounded lazy cleanup; bound it with
+rebuild thresholds (e.g. `LazyMinHeap::with_auto_rebuild`). For
+latency-sensitive caches, eviction cost must be comparable to lookup
+cost, not orders of magnitude higher.
+
+*Realised by:* D8 (O(1) eviction via direct handles).
+
+### 6. Metrics Are Not Optional
+
+You cannot tune what you do not measure. Track at least hit /
+miss rate, eviction count and reason (capacity vs expiration), and
+insert / update rate. Cachekit exposes these in two layers: an
+unconditional store-layer baseline
+([`StoreMetrics`](../../src/store/traits.rs), seven counters that
+ship in every build) and a feature-gated policy-layer hierarchy
+(per-policy `*Metrics` recorders, `*Snapshot` types, Prometheus
+exporter) behind the `metrics` Cargo feature. Lightweight counters
+on the hot path, detailed metrics behind a feature flag — see
+[Metrics design](metrics.md) for the recorder / snapshot /
+exporter split. Metrics should guide design decisions, not justify
+them afterward.
+
+*Realised by:* D11 (two-layer metrics).
+
+### 7. Separate Policy From Storage
+
+Design in layers: storage layer (how entries live in memory,
+allocation, layout, indexing — [`src/store/`](../../src/store);
+design rationale in [storage.md](storage.md)); policy layer (LRU,
+FIFO, LFU, LRU-K, 2Q, ARC, CAR, Clock, Clock-PRO, S3-FIFO, … —
+manipulates metadata and ordering only,
+[`src/policy/`](../../src/policy)); capability layer (opt-in
+extension traits — `RecencyTracking`, `FrequencyTracking`,
+`HistoryTracking`, `ExpiringCache` — implemented when the signal
+exists, which is how `Expiring<C>` composes over any policy
+without touching policy code); integration layer (ties application
+objects to cache entries via [`CacheBuilder`](../../src/builder.rs)
+and the `DynCache` runtime dispatcher).
+
+*Realised by:* D1 (policy / storage separation), D2 (object-safe
+kernel + capability traits), D9 (`DynCache` enum dispatch).
+
+### 8. Beware of "Nice" Rust APIs in Hot Paths
+
+Ergonomics often cost performance. Avoid in critical loops: heavy
+generics causing code bloat across many monomorphisations, trait
+objects for hot dispatch, closures capturing state, and iterator
+chains where a plain `for` loop would do. Prefer explicit loops,
+concrete types and monomorphised fast paths, and enum dispatch
+over `Box<dyn Trait>` when polymorphism is needed at the edges —
+this is exactly the trade `DynCache` makes (D9). You can wrap
+fast internals in nice APIs at the edges.
+
+*Realised by:* D9 (enum dispatch over `Box<dyn Cache>`).
+
+### 9. Scans Are the Enemy of Caches
+
+Large sequential reads destroy LRU-style caches. Solutions:
+scan-resistant policies (`LRU-K`, `2Q`, `SLRU`, `ARC`, `CAR`,
+`Clock-PRO`, `S3-FIFO`, `Heap-LFU` — all implemented today),
+explicit "scan mode" hints from the caller or workload layer, and
+bypass cache for known one-shot reads. If you ignore scans, your
+cache will look great in microbenchmarks and terrible in
+production.
+
+*Realised by:* the scan-resistant policy catalog itself;
+benchmarked under the workloads in
+[benchmarks/workloads.md](../benchmarks/workloads.md).
+
+### 10. Benchmark Like a System, Not a Library
+
+Do not rely on uniform-random key benchmarks. Use Zipfian
+distributions, mixed read / write workloads, scan + point lookup
+mixtures, and time-varying hot sets. Measure throughput, tail
+latency, memory overhead, and eviction cost. Cachekit's benchmark
+harness covers these dimensions — see
+[benchmarking.md](benchmarking.md) for the design (benchmark
+layers, monomorphic policy registry, JSON artifact schema),
+[`docs/benchmarks/workloads.md`](../benchmarks/workloads.md) for
+the workload catalog, and the runners under
+[`benches/`](../../benches). A cache that is 5 % faster on
+uniform-random keys but 50 % worse under scans is a bad cache.
+
+*Realised by:* the benchmark harness; design captured in
+[benchmarking.md](benchmarking.md).
+
+### 11. Rust Hot-Path Hazards Beyond Allocation
+
+`Arc` is expensive in hot paths; minimise it and lift `Arc::clone`
+out of inner loops. Fight borrow-checker-induced indirection with
+index-based access (`Handle`s, slot indices) instead of `&mut`
+chains; use interior mutability only where unavoidable, preferring
+`Cell<T>` over `RefCell<T>` when `T: Copy` and atomics when the
+value lives behind a shared reference. Beware hidden clones
+(particularly of keys on the eviction path), trait-object dispatch
+on read / insert (a specific instance of §8), and over-generic
+designs whose monomorphisation cost dwarfs their benefit. Rust can
+match C on hot paths, but only when systems-level discipline
 survives contact with the type system.
 
-## 12. Design for Failure Modes
+*Realised by:* D6 (slab / arena / intrusive), D9 (enum dispatch).
 
-Ask:
-- What happens under memory pressure?
-- What happens when eviction cannot keep up?
-- What happens under pathological access patterns?
+### 12. Design for Failure Modes
 
-Add:
-- Backpressure or rejection when full.
-- Bypass modes.
-- Emergency eviction strategies.
+Ask: what happens under memory pressure, when eviction cannot keep
+up, and under pathological access patterns? Add backpressure or
+rejection when full (`StoreFull`), bypass modes, and emergency
+eviction strategies. Cross-thread back-pressure semantics across a
+layered cache stack remain a roadmap topic; today the answer is
+`StoreFull` propagation up to the caller. A cache that collapses
+under stress is worse than no cache.
 
-A cache that collapses under stress is worse than no cache.
+*Realised by:* D12 (three-tier error model); see also
+[error-model.md](error-model.md).
 
-## 13. Compile-Time and Runtime Composition
+### 13. Compile-Time and Runtime Composition
 
-cachekit's externally visible surface is shaped by two composition
+Cachekit's externally visible surface is shaped by two composition
 mechanisms that together let users pay only for what they use.
-
-**Per-policy feature flags.** Every policy is behind a Cargo feature
-(`policy-lru`, `policy-s3-fifo`, …), with `policy-all` for "everything"
-and a small default of `policy-s3-fifo`, `policy-lru`, `policy-fast-lru`,
-`policy-lru-k`, `policy-clock`. Optional capabilities are gated the
-same way: `metrics`, `concurrency`, `serde`, and `ttl`. Downstream
-crates can disable defaults and select the minimum surface they need;
-see [`Cargo.toml`](../../Cargo.toml).
-
-**Capability traits + runtime dispatch.** Extension traits
-([`RecencyTracking`](../../src/traits.rs), `FrequencyTracking`,
-`HistoryTracking`, `ExpiringCache`) keep optional behavior off the
-core `Cache<K, V>` trait. For ergonomic builder construction without
-forcing trait objects on the user, [`CacheBuilder`](../../src/builder.rs)
-returns a [`DynCache<K, V>`](../../src/builder.rs) that dispatches via
-an internal enum match rather than `Box<dyn Cache>`. When TTL is
-enabled, the builder returns a sibling `DynExpiringCache<K, V>` that
-threads the expiry check around each variant's `Cache` call — a worked
-example of capability composition. See [`docs/design/ttl.md`](ttl.md)
-for the full design and [`src/policy/expiring.rs`](../../src/policy/expiring.rs)
-for the decorator itself.
-
-## Bottom Line
-
-High-performance caches are not about clever algorithms — they are about:
-- Memory layout.
-- Allocation discipline.
-- Contention control.
-- Eviction predictability.
-- Workload realism.
-
-In Rust, your main enemy is not safety — it is abstraction overhead and
-accidental allocation. Design from the metal upward, then wrap it in
-something pleasant to use.
+**Per-policy feature flags**: every policy is behind a Cargo
+feature (`policy-lru`, `policy-s3-fifo`, …), with `policy-all` for
+"everything" and a small default of `policy-s3-fifo`, `policy-lru`,
+`policy-fast-lru`, `policy-lru-k`, `policy-clock`; optional
+capabilities are gated the same way (`metrics`, `concurrency`,
+`serde`, `ttl`). One sharp edge worth naming inline: the default
+feature set includes `policy-fast-lru`, which is `!Send + !Sync`,
+so default-feature `DynCache` is also `!Send + !Sync`. **Capability
+traits + runtime dispatch**: extension traits keep optional
+behaviour off the core `Cache<K, V>` trait; the builder returns a
+[`DynCache<K, V>`](../../src/builder.rs) that dispatches via an
+internal enum match rather than `Box<dyn Cache>`. When the builder
+is configured with `.with_default_ttl(...)`, it returns a sibling
+`DynExpiringCache<K, V>` that threads the expiry check around each
+variant's `Cache` call — a worked example of capability composition.
+
+*Realised by:* D2 (capability traits), D9 (enum dispatch), D10
+(per-policy feature flags), D13 (TTL as decorator).
 
 ## See Also
 
@@ -329,34 +843,39 @@ Design docs:
 - [Concurrency](concurrency.md) — `Concurrent*` wrappers, `RwLock`
   discipline, sharded primitives, `ConcurrentCache` marker
 - [Cache trait hierarchy](trait-hierarchy.md) — `Cache<K, V>` kernel,
-  capability traits, read/mutate split, object safety
+  capability traits, read / mutate split, object safety
 - [Builder and runtime dispatch](builder-and-dyn-dispatch.md) —
-  `CachePolicy`, `DynCache`, enum-vs-`Box<dyn>` trade-off, adding new
-  policies
+  `CachePolicy`, `DynCache`, enum-vs-`Box<dyn>` trade-off, adding
+  new policies
 - [Weighted eviction](weighted-eviction.md) — `WeightStore` dual
-  limits, weight function contract, GDS/GDSF pre-staging
+  limits, weight-function contract, GDS / GDSF pre-staging
 - [Metrics](metrics.md) — recorder / snapshot / exporter split,
   `MetricsCell`, Prometheus exporter, feature gating
 - [Error model](error-model.md) — panic vs `Result` discipline,
   four error types, debug-only invariant checks
-- [Benchmarking](benchmarking.md) — benchmark layers, monomorphic policy
-  registry, JSON artifact schema, reproducibility rules
-- [Hashing and key identity](hashing.md) — hasher choices, `KeyInterner`,
-  `ShardSelector`, HashDoS trade-offs
+- [Benchmarking](benchmarking.md) — benchmark layers, monomorphic
+  policy registry, JSON artifact schema, reproducibility rules
+- [Hashing and key identity](hashing.md) — hasher choices,
+  `KeyInterner`, `ShardSelector`, HashDoS trade-offs
 - [Sharding](sharding.md) — current sharded primitives, routing,
   capacity semantics, roadmap for sharded caches
-- [Serialization](serialization.md) — current `serde` surface, cache-state
-  persistence boundaries, TTL and hash-seed rules
-- [Non-goals](non-goals.md) — explicit boundaries for what cachekit does
-  not try to be
-- [TTL](ttl.md) — applied example of every principle above
+- [Storage layer](storage.md) — store trait family
+  (`StoreCore` / `StoreMut` and their concurrent peers), concrete
+  stores, `StoreMetrics` baseline, `WeightStore`'s divergence
+- [Serialization](serialization.md) — current `serde` surface,
+  cache-state persistence boundaries, TTL and hash-seed rules
+- [Non-goals](non-goals.md) — explicit boundaries for what cachekit
+  does not try to be
+- [TTL](ttl.md) — applied example of every layer above
 - [Doc style guide](style-guide.md)
 
 Reference docs:
-- [Policy overview](../policies/README.md) and [roadmap](../policies/roadmap/README.md)
+- [Policy overview](../policies/README.md) and
+  [roadmap](../policies/roadmap/README.md)
 - [Policy data structures](../policy-ds/README.md)
 - [Stores](../stores/README.md)
 - [Read-only traits](../guides/read-only-traits.md)
 - [Choosing a policy](../guides/choosing-a-policy.md)
-- [Benchmarks overview](../benchmarks/overview.md) and [workloads](../benchmarks/workloads.md)
+- [Benchmarks overview](../benchmarks/overview.md) and
+  [workloads](../benchmarks/workloads.md)
 - [Rust API Guidelines checklist](https://rust-lang.github.io/api-guidelines/checklist.html)
diff --git a/docs/design/metrics.md b/docs/design/metrics.md
index 88ace2b..ca7aa1f 100644
--- a/docs/design/metrics.md
+++ b/docs/design/metrics.md
@@ -378,6 +378,7 @@ is a **separate**, simpler structure that ships unconditionally
 store-layer implementation tracks:
 
 ```rust,ignore
+#[non_exhaustive]
 pub struct StoreMetrics {
     pub hits: u64,
     pub misses: u64,
@@ -385,13 +386,15 @@ pub struct StoreMetrics {
     pub updates: u64,
     pub removes: u64,
     pub evictions: u64,
+    pub expirations: u64,
 }
 ```
 
 The two systems coexist:
 
 - `StoreMetrics` is the store-layer baseline. Always present, always
-  cheap, six counters.
+  cheap, seven counters. `expirations` stays at `0` on stores that
+  do not own a TTL surface.
 - `src/metrics/` (feature-gated) is the policy-layer detailed
   metrics — recorder traits, snapshots, exporter, per-policy signals.
 
diff --git a/docs/design/non-goals.md b/docs/design/non-goals.md
index 369e28c..f57b2ed 100644
--- a/docs/design/non-goals.md
+++ b/docs/design/non-goals.md
@@ -106,6 +106,17 @@ text exporter. It does not provide:
 Use your monitoring stack for those. cachekit exposes enough counters to make
 policy tuning possible without making the cache own observability.
 
+`MetricsCell` (the crate-private interior-mutability wrapper used by
+`&self` recorder paths) is **not** a substitute for `AtomicU64`. It is
+sound only when increments happen under exclusive external
+synchronization — single-threaded, `&mut self`, or behind a write
+lock / `Mutex`. A shared `RwLock::read` guard does not serialize
+readers and so is not sufficient protection. Counters reachable from a
+read-locked `&self` path must use `AtomicU64` (or escalate to a write
+lock before recording). The contract is restated on the `unsafe impl
+Sync` block in [`src/metrics/cell.rs`](../../src/metrics/cell.rs) and
+in the [Metrics design](metrics.md#metricscell-interior-mutability-under-external-lock).
+
 ## Not a Policy Research Playground at the Cost of Hot Paths
 
 New policies are welcome, but they must fit the crate's constraints:
diff --git a/docs/design/storage.md b/docs/design/storage.md
new file mode 100644
index 0000000..b063dd3
--- /dev/null
+++ b/docs/design/storage.md
@@ -0,0 +1,447 @@
+# Storage Layer
+
+> Status: design rationale for the store trait family in
+> [`src/store/traits.rs`](../../src/store/traits.rs) and the concrete
+> stores under [`src/store/`](../../src/store). Companion to
+> [`design.md`](design.md) §7 (policy/storage separation),
+> [`trait-hierarchy.md`](trait-hierarchy.md) (the parallel cache-trait
+> family), [`concurrency.md`](concurrency.md), and
+> [`weighted-eviction.md`](weighted-eviction.md).
+
+cachekit splits caches into two layers: a *policy* that decides what
+to evict and a *store* that owns the keys and values. This doc
+covers the store side. It explains why the store traits look the way
+they do, what each shipped concrete store is for, how the
+sequential/concurrent split mirrors the cache-trait family, and why
+[`WeightStore`](../../src/store/weight.rs) deliberately sits outside
+the rest of the family.
+
+## Goals
+
+The store layer is shaped around four things:
+
+1. **Decouple ownership from policy.** A policy that doesn't know how
+   entries are laid out in memory can be swapped without rewriting
+   storage code, and a store can be swapped without rewriting policy
+   code. This is the policy/storage separation rule in
+   [`design.md`](design.md) §7.
+2. **Make capacity refusal explicit, not implicit.** When a store is
+   full, it returns [`StoreFull`](../../src/store/traits.rs) rather
+   than silently evicting. The caller (policy or user) decides what to
+   evict. This is the core of the layering — without it, every store
+   would have to ship an eviction strategy.
+3. **Mirror the sequential / concurrent split that already exists
+   for caches.** The sequential traits return owned `V` and borrowed
+   `&V`; the concurrent traits return `Arc<V>`. The reasoning is the
+   same as for the cache trait family in
+   [`trait-hierarchy.md`](trait-hierarchy.md) — references cannot
+   safely outlive lock guards.
+4. **Keep the always-on observability minimal but useful.**
+   [`StoreMetrics`](../../src/store/traits.rs) ships unconditionally
+   with seven counters: `hits`, `misses`, `inserts`, `updates`,
+   `removes`, `evictions`, and `expirations`. None of them are
+   feature-gated; the richer per-policy metrics hierarchy is, but
+   the store-layer baseline is not. `expirations` stays at `0` on
+   stores that do not own a TTL surface — the TTL count for an
+   `Expiring<…>` decorator is exposed separately via
+   [`Expiring::expirations()`](../../src/policy/expiring.rs).
+
+## Map of the hierarchy
+
+```text
+Single-threaded (direct ownership)        Concurrent (shared ownership)
+────────────────────────────────          ───────────────────────────────
+
+  ┌────────────────────┐                    ┌──────────────────────────┐
+  │     StoreCore      │                    │   ConcurrentStoreRead    │
+  │  get(&K) -> &V     │                    │  get(&K) -> Arc<V>       │
+  │  contains, len,    │                    │  contains, len,          │
+  │  capacity, metrics │                    │  capacity, metrics       │
+  └─────────┬──────────┘                    └────────────┬─────────────┘
+            │ extends                                    │ extends
+            ▼                                            ▼
+  ┌────────────────────┐                    ┌──────────────────────────┐
+  │      StoreMut      │                    │     ConcurrentStore      │
+  │  try_insert,       │                    │  try_insert (Arc<V>),    │
+  │  remove, clear     │                    │  remove, clear           │
+  └────────────────────┘                    └──────────────────────────┘
+
+  ┌────────────────────┐                    ┌──────────────────────────┐
+  │   StoreFactory     │                    │ ConcurrentStoreFactory   │
+  │ type Store: ...    │                    │ type Store: ...          │
+  │ new(capacity)      │                    │ new(capacity)            │
+  └────────────────────┘                    └──────────────────────────┘
+
+  StoreMetrics (unconditional, 7 counters)
+  StoreFull   (zero-sized error type)
+```
+
+Every concrete store implements exactly one column. The
+single-threaded stores use direct ownership; the concurrent stores
+use `Arc<V>` because borrowed references cannot outlive
+`RwLockReadGuard`. See
+[`concurrency.md`](concurrency.md#why-concurrent-does-not-implement-cachek-v)
+for the equivalent argument at the cache-trait level.
+
+## Layer 1 — `StoreCore` / `StoreMut`
+
+The sequential surface. Three design choices worth naming:
+
+### `&V` return position
+
+[`StoreCore::get`](../../src/store/traits.rs) returns `Option<&V>`.
+The borrow is tied to `&self`, so callers can read without cloning.
+This is the right shape for a sequential trait because the alternative
+(`V` by value) forces `V: Clone` everywhere or hands out `Arc<V>` on
+every call.
+
+The concurrent counterpart cannot do this — see
+[Layer 2](#layer-2--concurrentstoreread--concurrentstore) below.
+
+### `try_insert` returns `Result<Option<V>, StoreFull>`
+
+Three independent failure modes hide in this one signature:
+
+| Outcome | Return value | Meaning |
+|---------|--------------|---------|
+| New key fits | `Ok(None)` | Inserted; no previous value |
+| Existing key updated | `Ok(Some(old))` | Replaced; old value handed back |
+| Store full and key is new | `Err(StoreFull)` | Caller must evict and retry |
+
+Updates to an existing key **always succeed** — they cannot push the
+store past capacity by entry count, and the previous value is handed
+back as `Ok(Some(old))`. Capacity refusal is a `StoreFull` *only*
+when the key is new and inserting it would exceed the entry count.
+
+[`WeightStore`](../../src/store/weight.rs) extends this to a
+dual-limit model where updates *can* fail (a larger value replacing a
+smaller one can exceed the weight budget). See
+[`weighted-eviction.md`](weighted-eviction.md#dual-limit-model) for
+the full table.
+
+### No automatic eviction
+
+The store never evicts on its own. Returning `StoreFull` is the
+signal to the caller that *they* must decide who to remove. This is
+the layering rule from [`design.md`](design.md) §7 made concrete:
+
+- A policy layered over `StoreMut` evicts its chosen victim, then
+  retries `try_insert`.
+- A user using a `StoreMut` directly evicts a key they pick (random,
+  oldest-by-some-criterion, etc.), then retries.
+
+A store that evicted on its own would lock a single eviction
+strategy into every consumer, and would prevent layering a *better*
+eviction policy on top.
+
+## Layer 2 — `ConcurrentStoreRead` / `ConcurrentStore`
+
+The concurrent surface mirrors the sequential one with two
+substitutions:
+
+- Returns are `Arc<V>` rather than `&V` / `V`. References cannot
+  outlive the lock guard they were extracted from; `Arc<V>` carries
+  ownership safely past lock release.
+- All methods take `&self`. Internal synchronization (almost always
+  `parking_lot::RwLock`) is the store's responsibility, not the
+  caller's. The wrapper does **not** require `&mut self` for
+  mutation.
+
+Implementors must be `Send + Sync`. The trait bound is on the trait
+declaration itself
+(`pub trait ConcurrentStoreRead<K, V>: Send + Sync`), so any code
+generic over `ConcurrentStoreRead` automatically requires thread
+safety from the implementor.
+
+This is the same shape `Concurrent*` cache wrappers use — see
+[`concurrency.md`](concurrency.md#the-dominant-pattern-sequential-core-concurrent-wrapper)
+for the parallel reasoning.
+
+## Layer 3 — Factories
+
+```rust,ignore
+pub trait StoreFactory<K, V> {
+    type Store: StoreMut<K, V>;
+    fn new(capacity: usize) -> Self::Store;
+}
+
+pub trait ConcurrentStoreFactory<K, V> {
+    type Store: ConcurrentStore<K, V>;
+    fn new(capacity: usize) -> Self::Store;
+}
+```
+
+Factory traits exist so generic code can construct a store without
+naming the concrete type. They mirror `CacheFactory` in
+[`trait-hierarchy.md`](trait-hierarchy.md#cachefactory-and-cacheconfig).
+In practice most code constructs stores directly; the factories are
+used by test harnesses and benchmark runners that want to
+parameterise across store implementations.
+
+## `StoreMetrics`: the always-on baseline
+
+```rust,ignore
+#[non_exhaustive]
+pub struct StoreMetrics {
+    pub hits: u64,
+    pub misses: u64,
+    pub inserts: u64,
+    pub updates: u64,
+    pub removes: u64,
+    pub evictions: u64,
+    pub expirations: u64,
+}
+```
+
+Two things distinguish this from the policy-layer metrics in
+[`metrics.md`](metrics.md):
+
+- **It ships in every build.** No `#[cfg(feature = "metrics")]`
+  gate. The seven counters here are universal enough to be a
+  baseline contract every store satisfies.
+- **It is read-only at the trait surface.** `StoreCore::metrics()`
+  returns a snapshot `StoreMetrics` by value. How a store records
+  the increments (plain `u64`, `AtomicU64`, `MetricsCell`,
+  `StoreCounters` in
+  [`src/store/weight.rs`](../../src/store/weight.rs)) is an
+  implementation detail. Concurrent stores typically use
+  `AtomicU64`; single-threaded stores use plain `u64` or `Cell<u64>`.
+
+`StoreMetrics` is `#[non_exhaustive]`, so adding a new universal
+counter is a minor version bump. The `expirations` field landed
+this way (added when the TTL surface needed time-driven removals
+distinguished from capacity-driven evictions).
+
+For per-policy detail (recency rank reads, LFU bucket promotions,
+S3-FIFO ghost hits) see the policy-layer metrics behind the
+`metrics` feature.
+
+## `StoreFull` error semantics
+
+[`StoreFull`](../../src/store/traits.rs) is a zero-sized type that
+carries no data. The caller already knows what they tried to insert;
+attaching the key/value to the error would force `K: Clone` /
+`V: Clone` on the error path for no information gain.
+
+The error is co-located with the trait that returns it
+(`src/store/traits.rs`) rather than in `src/error.rs`. The reasoning
+matches the broader error model
+([`error-model.md`](error-model.md#why-four-error-types-not-one)):
+each error type lives near the surface that produces it.
+
+## Concrete stores
+
+Four concrete store types ship today, plus their concurrent
+counterparts. Each picks a different point in the memory-layout
+space.
+
+| Store | Backing | Key shape | Threading | When to use |
+|---|---|---|---|---|
+| [`HashMapStore`](../../src/store/hashmap.rs) | `HashMap<K, V, S>` | `K: Eq + Hash` | sequential | Default; any cache where the key drives layout |
+| [`ConcurrentHashMapStore`](../../src/store/hashmap.rs) | `RwLock<HashMap<…>>` | same | concurrent | Default concurrent shape |
+| [`ShardedHashMapStore`](../../src/store/hashmap.rs) | N `RwLock<HashMap<…>>` shards | same | concurrent, contention-aware | When one `RwLock` is the bottleneck |
+| [`SlabStore`](../../src/store/slab.rs) | slab arena with `EntryId` handles | `K: Eq + Hash` | sequential | Policies that need stable `EntryId`s for intrusive metadata |
+| [`ConcurrentSlabStore`](../../src/store/slab.rs) | `RwLock<SlabStore>` | same | concurrent | Concurrent slab access |
+| [`HandleStore`](../../src/store/handle.rs) | `HashMap<H, Arc<V>>` | opaque handle `H` | sequential | When keys are pre-interned and only the handle is in the hot path |
+| [`ConcurrentHandleStore`](../../src/store/handle.rs) | `RwLock<HashMap<H, Arc<V>>>` | same | concurrent | Concurrent variant of the above |
+| [`WeightStore`](../../src/store/weight.rs) | `FxHashMap` + per-entry weight | `K: Eq + Hash` | sequential | Variable-size values; byte-budgeted caches |
+| [`ConcurrentWeightStore`](../../src/store/weight.rs) | `RwLock<WeightStore>` | same | concurrent | Concurrent variant of the above |
+
+### `HashMapStore`
+
+The default public store. Uses `std::collections::hash_map::RandomState`
+by default for HashDoS-resistant hashing on the public surface; users
+who control the key source can opt into a faster hasher via
+`with_hasher`. See [`hashing.md`](hashing.md) for the threat model.
+
+This is the right choice when keys are typed
+(`String`, `u64`, `(TenantId, ResourceId)`, …) and the policy does
+not need stable per-entry handles. Most caches built through
+`CacheBuilder` end up here either directly or indirectly.
+
+### `SlabStore`
+
+Backs stores in a slab arena. Each entry has a stable `EntryId`
+handle that survives mutations to other entries. This is essential
+for policies that thread intrusive metadata through entries — LRU's
+recency list, S3-FIFO's small/main queues, NRU's reference bit
+ring — because pointer chasing without stable indirection makes the
+borrow checker rejection-prone and pointer chasing hostile to the
+cache hierarchy.
+
+Use `SlabStore` directly when building a policy that wants slot
+handles. Most users reach it indirectly through the policy types
+that consume it.
+
+### `HandleStore`
+
+A specialised shape: keys are stored elsewhere (typically a
+[`KeyInterner`](../../src/ds/interner.rs)) and the store maps
+`Handle -> Arc<V>`. The motivation is to avoid cloning large keys
+on every operation when many policies (LFU bucket maps,
+frequency-bucket arrays, ARC ghost lists) need a compact key proxy
+anyway.
+
+`HandleStore` returns `Arc<V>` even in the single-threaded variant.
+This is the same divergence
+[`WeightStore`](#weightstores-deliberate-divergence) takes, and for
+the same reason: the values targeted by this shape (interned blobs,
+deduplicated payloads) benefit from cheap shared ownership.
+
+### `WeightStore`'s deliberate divergence
+
+`WeightStore` does **not** implement `StoreCore` / `StoreMut`. It is
+a sibling of the trait family, not a subtype. The reasons live in
+[`weighted-eviction.md`](weighted-eviction.md) but worth recapping
+here:
+
+- It returns `Arc<V>` (not `&V`) even in the single-threaded
+  variant. This is necessary for the concurrent variant and the
+  single-threaded variant inherits the same shape so users can swap
+  between them by changing one type.
+- Its `try_insert` enforces a *dual* limit (entry count and weight
+  budget). Updates can fail when the weight delta would exceed
+  budget. `StoreMut::try_insert`'s contract is "updates always
+  succeed," which `WeightStore` cannot honour.
+- It takes an `F: Fn(&V) -> usize` weight function. Carrying that
+  third type parameter would propagate through every layer of
+  `StoreMut`-generic code unnecessarily.
+
+The concurrent variant *does* implement `ConcurrentStoreRead` /
+`ConcurrentStore` because those return `Arc<V>` and accept `Arc<V>`
+on insert. The asymmetry is awkward but honest — the concurrent
+trait family already has the shape `WeightStore` needs.
+
+## Sharded stores
+
+`ShardedHashMapStore<K, V, S>` is the only sharded store that ships
+today. It owns N independent `RwLock<HashMap<…>>` shards, each
+addressed by hashing the key through a
+[`ShardSelector`](../../src/ds/shard.rs).
+
+| Property | Single concurrent | Sharded |
+|---|---|---|
+| Lock acquisition | One global `RwLock` per op | One shard `RwLock` per op |
+| Hot key contention | Yes — all readers/writers compete | Only readers/writers on the same shard |
+| Capacity model | Single global cap | Per-shard caps that sum to global cap |
+| Eviction quality | Global victim picking | Per-shard victim picking |
+| Implementation complexity | Low | Medium |
+
+See [`sharding.md`](sharding.md) for the full discussion. Note that
+the sharded primitive lives at the *data-structure* / *store* layer;
+a sharded *cache policy* (e.g. `ShardedLruCache`) is roadmap.
+
+## Why not a single unified `Store` trait?
+
+`StoreCore` could in principle subsume `StoreMut` (just make all
+methods `&mut self`-or-`&self`). It doesn't, for the same reason
+`Cache<K, V>` separates `peek` from `get`: a read-only surface lets
+concurrent wrappers acquire only the read lock.
+
+`StoreCore` + `StoreMut` could in principle merge with
+`ConcurrentStoreRead` + `ConcurrentStore` via an `Arc<V>`-returning
+universal variant. That collapses the sequential `&V` fast path into
+an unnecessary `Arc<V>` round-trip, which is exactly what the
+sequential `Cache::get -> Option<&V>` shape is trying to avoid.
+
+Two parallel families is the cost of letting both shapes pay only
+for what they use.
+
+## Adding a new store
+
+Checklist for landing a new store implementation:
+
+1. **Pick the layer.** Sequential (`StoreCore` / `StoreMut`) or
+   concurrent (`ConcurrentStoreRead` / `ConcurrentStore`). Usually
+   both, with the concurrent variant wrapping the sequential one in
+   `RwLock`.
+2. **Implement the read trait first.** `get`, `contains`, `len`,
+   `capacity`, `metrics`. Override `metrics()` to expose your
+   counters rather than the default-zero implementation.
+3. **Implement the mut trait.** `try_insert`, `remove`, `clear`.
+   `try_insert` must return `Err(StoreFull)` for new keys at
+   capacity; updates to existing keys must not fail (unless the
+   store has additional invariants like `WeightStore`'s weight
+   budget — document the divergence at the module level).
+4. **Add a `StoreFactory` impl** if the store has a stable
+   `new(capacity)` constructor and is likely to be parameterised
+   over in generic code.
+5. **Implement `Send + Sync`** for the concurrent variant. The
+   sequential variant typically is not `Sync` (because it holds
+   `Cell<u64>` for `MetricsCell` counters or `RefCell` for any
+   interior state).
+6. **Document the threat model.** Which hasher does the store
+   default to? Is it HashDoS-resistant? Are there public surfaces
+   that expose internal counters that could leak entry-size
+   information? Match the
+   [`hashing.md`](hashing.md) discipline.
+7. **Add `docs/stores/<name>.md`** following the
+   [doc style guide](style-guide.md#design-doc-style). Link the new
+   doc from [`docs/stores/README.md`](../stores/README.md).
+8. **Write proptest or fuzz coverage** for invariants:
+   `len == sum(entries)`, metric counters monotonic,
+   `try_insert(k, v)` followed by `remove(k)` round-trips. See
+   [`docs/testing/testing.md`](../testing/testing.md) for the
+   conventions.
+
+## When not to add a new store
+
+The store layer is small on purpose. Before adding a new store,
+check:
+
+- **Is the difference a *policy* difference or a *layout*
+  difference?** Different eviction strategies belong above the
+  store, not at the store layer.
+- **Is the shape already covered by a hasher swap or sharding?**
+  `HashMapStore::with_hasher(FxBuildHasher)` and
+  `ShardedHashMapStore` cover most of the obvious knobs.
+- **Does it justify its own trait-family divergence?**
+  `WeightStore` is the precedent for diverging — variable weights
+  forced a dual-limit model that `StoreMut` cannot express. New
+  stores that fit `StoreMut`'s contract should implement it rather
+  than introduce a sibling.
+
+## Failure modes worth naming
+
+- **`StoreFull` from `try_insert` on a `WeightStore`-style store
+  with weight budget remaining.** Caller should consult both
+  `len()` and (for weight-aware stores) `total_weight()` to know
+  which budget bit. The error type is the same; the resolution
+  differs.
+- **Panic during a user-supplied callback in
+  `ConcurrentWeightStore::try_insert`.** The weight function runs
+  inside the write lock. Under `panic = "unwind"` the lock is
+  released (parking_lot doesn't poison), but the inner state is
+  whatever the panicking weight function left it in. Under the
+  crate's release-default `panic = "abort"`, the process exits
+  before any observer can see partial state. See
+  [`error-model.md`](error-model.md#operational-contract-panic-profile).
+- **Hash collisions on `FxHashMap`-backed stores under adversarial
+  keys.** `WeightStore` and policy-internal maps are the targets;
+  see [`hashing.md`](hashing.md#fxhash-hot-internal-default) for
+  the trade-off and the user-facing escape hatches.
+
+## See also
+
+- [Design overview](design.md) — §7 frames policy/storage
+  separation at the principles level
+- [Cache trait hierarchy](trait-hierarchy.md) — parallel trait
+  family at the policy layer; the `&V` vs `Arc<V>` reasoning is
+  shared
+- [Concurrency](concurrency.md) — `Concurrent*` wrapper pattern
+  applied at the store layer
+- [Weighted eviction](weighted-eviction.md) — `WeightStore`'s
+  dual-limit model and deliberate divergence from `StoreMut`
+- [Hashing and key identity](hashing.md) — store-level hasher
+  defaults and overrides
+- [Sharding](sharding.md) — `ShardedHashMapStore` and the roadmap
+  for sharded cache policies
+- [Metrics](metrics.md) — relationship between the always-on
+  `StoreMetrics` baseline and the feature-gated policy-layer
+  metrics
+- [Error model](error-model.md) — `StoreFull` semantics, panic
+  behaviour during user-supplied callbacks
+- [Stores reference](../stores/README.md) — runtime-behaviour
+  documentation for each concrete store
+- [`src/store/traits.rs`](../../src/store/traits.rs) — canonical
+  trait definitions
diff --git a/docs/design/style-guide.md b/docs/design/style-guide.md
index 70beb05..3abfe66 100644
--- a/docs/design/style-guide.md
+++ b/docs/design/style-guide.md
@@ -1,12 +1,29 @@
 # Documentation Style Guide
 
-## Goals
+This style guide covers two related but distinct concerns:
+
+- **Rustdoc style** for module- and item-level documentation inside
+  `src/`. The audience is a Rust developer reading the API.
+- **Design-doc style** for the prose docs in `docs/design/`. The
+  audience is a contributor (or future maintainer) trying to
+  understand why a piece of cachekit looks the way it does.
+
+Both styles share the same goals: make behaviour, invariants, and
+trade-offs clear without verbosity, and keep examples compile-ready
+and focused.
+
+## Rustdoc style
+
+### Goals
+
 - Keep module docs consistent across the codebase.
 - Make behavior, invariants, and trade-offs clear without verbosity.
 - Ensure examples compile and demonstrate a single, focused use case.
 
-## Module Doc Layout
+### Module doc layout
+
 Use `//!` and follow this order:
+
 - Architecture
 - Key Components
 - Core Operations
@@ -17,11 +34,14 @@ Use `//!` and follow this order:
 - Thread Safety
 - Implementation Notes
 
-## Item Docstrings
-Use `///` with a one-sentence summary. Mention invariants or complexity only when
-they matter. Avoid Args/Returns sections unless behavior is non-obvious.
+### Item docstrings
+
+Use `///` with a one-sentence summary. Mention invariants or complexity
+only when they matter. Avoid Args/Returns sections unless behavior is
+non-obvious.
+
+### Template
 
-## Template
 ```rust
 //! ## Architecture
 //! ...
@@ -51,6 +71,139 @@ they matter. Avoid Args/Returns sections unless behavior is non-obvious.
 //!
 //! ## Implementation Notes
 //! ...
-///
+
 /// Brief summary of behavior.
 ```
+
+## Design-doc style
+
+Files in `docs/design/` follow a shared shape so a reader who has
+finished one knows what to expect from the next. The shape is not a
+strict template — sections are added or omitted as the topic
+warrants — but the meta-conventions below are uniform.
+
+### Status preamble
+
+Every design doc opens with a blockquote that names what the doc
+covers and links its immediate siblings. The convention is
+`> Status: <one-sentence framing>. Companion to <links>.`
+
+```markdown
+> Status: design rationale for the concurrent surface that ships today
+> behind the `concurrency` feature flag. Companion to the cross-cutting
+> principles in [`docs/design/design.md`](design.md) §3 and the trait
+> rationale in [`docs/design/trait-hierarchy.md`](trait-hierarchy.md).
+```
+
+The preamble does three things in one paragraph:
+
+- Names the doc's scope.
+- States the implementation status (shipped, partially shipped, deferred).
+- Anchors the doc in the wider design corpus by linking siblings.
+
+### Section structure
+
+- Numbered top-level sections (`§1`, `§2`, …) are encouraged when other
+  docs may want to cross-reference specific sections. `design.md`,
+  `ttl.md`, and `trait-hierarchy.md` all do this.
+- Closer sections, in order, when relevant:
+    - **Trade-offs** — explicit tables or side-by-side prose comparing
+      alternatives.
+    - **Failure modes** — what breaks under stress, panic, contention.
+    - **Future directions** / **Roadmap** — what is deferred, in rough
+      priority order.
+    - **Adding a new X** — checklist for the most common contribution
+      pattern (new policy, new capability trait, new metric, etc.).
+    - **When not to use X** — explicit boundaries for users.
+    - **See also** — links to sibling design docs, source files, and
+      external references.
+
+### Tables for trade-offs
+
+When two or more options have different trade-offs, use a table rather
+than a bulleted list. Tables make it easy to scan one column for one
+property and force the writer to give every option the same set of
+properties.
+
+```markdown
+| Property | Option A | Option B |
+|----------|----------|----------|
+| Cost     | …        | …        |
+| Memory   | …        | …        |
+```
+
+### Diagrams
+
+Use fenced code blocks tagged `text` for ASCII diagrams. Avoid Mermaid
+or other rich diagram formats — plain text renders in every tool
+(rustdoc, GitHub, terminal `less`) without configuration. See the
+hierarchy diagram in `trait-hierarchy.md` for the conventional shape.
+
+```text
+            ┌──────────────────┐
+            │  Cache<K, V>     │
+            └────────┬─────────┘
+                     │ extends
+        ┌────────────┼────────────┐
+        ▼            ▼            ▼
+   Capability1   Capability2   Capability3
+```
+
+### Source citations
+
+Every concrete claim that names a type, trait, or method should link
+the source file. Use relative paths
+(`[`src/policy/lru.rs`](../../src/policy/lru.rs)`) so the docs work
+both on GitHub and in local clones. When citing a specific feature
+gate, name the feature inline (`gated by `#[cfg(feature = "ttl")]`).
+
+### Cross-references
+
+- Refer to sibling design docs by filename, not display title:
+  `[concurrency](concurrency.md)` rather than `[Concurrency design](...)`.
+  This survives renames better and matches the rest of the corpus.
+- When citing a specific section, append the section number or anchor:
+  `[design.md §3](design.md)`, `[concurrency.md §"Failure modes"](concurrency.md#failure-modes)`.
+
+### Tone
+
+- Direct, declarative prose. "The wrapper takes a write lock", not
+  "The wrapper will take a write lock".
+- Trade-offs are stated explicitly, not buried in passive voice.
+- Marketing language is out of place. "Excellent", "powerful",
+  "blazing fast" — replace with the property that motivated the
+  adjective.
+- It is acceptable, and often correct, to say "this is a known sharp
+  edge" or "this is the wrong trait for that surface" when it is.
+
+### `See Also` closer
+
+Every design doc ends with a `## See Also` section. The conventional
+order is:
+
+1. **Sibling design docs** with a one-sentence framing of each link.
+2. **Source files** that contain the canonical implementation.
+3. **External references** (Rust API Guidelines, research papers,
+   Wikipedia entries) when relevant.
+
+The framing matters: a bare list of links is less useful than a list
+where each entry says why the reader might follow it.
+
+### Adding a new design doc
+
+Checklist:
+
+1. **Pick a clear single topic.** If you are documenting two concerns,
+   split into two docs and link them.
+2. **Write the status preamble first.** Naming the scope up front
+   keeps the rest of the doc honest.
+3. **Number top-level sections** if they're likely to be
+   cross-referenced from elsewhere.
+4. **Add a `See Also` block** to siblings that should know about the
+   new doc, and add a corresponding bullet to
+   [`docs/index.md`](../index.md).
+5. **Link from `design.md`'s See Also block** so the new doc is
+   reachable from the index design overview.
+6. **Mirror trade-offs as tables** when there are alternatives.
+7. **Close with `When not to use X`** (or the equivalent) — explicit
+   boundaries are part of the contract.
diff --git a/docs/design/trait-hierarchy.md b/docs/design/trait-hierarchy.md
index 2eee6ea..27b773c 100644
--- a/docs/design/trait-hierarchy.md
+++ b/docs/design/trait-hierarchy.md
@@ -411,5 +411,8 @@ GDS lands keeps the surface honest.
 - [Read-only traits](../guides/read-only-traits.md) — user-facing
   guidance on the `peek` / `get` split
 - [`src/traits.rs`](../../src/traits.rs) — the canonical definitions
-- [`src/store/traits.rs`](../../src/store/traits.rs) — parallel
-  trait family at the store layer (sequential + concurrent)
+- [Storage layer](storage.md) — parallel trait family at the store
+  layer (sequential + concurrent), with the same `&V` vs. `Arc<V>`
+  split reasoning
+- [`src/store/traits.rs`](../../src/store/traits.rs) — canonical
+  store-trait definitions
diff --git a/docs/design/ttl.md b/docs/design/ttl.md
index 5a8d5b4..8cde90a 100644
--- a/docs/design/ttl.md
+++ b/docs/design/ttl.md
@@ -1,11 +1,13 @@
-# TTL / Time-Based Expiration — Design Exploration
+# TTL / Time-Based Expiration — Design Notes
 
-> Status: design exploration. Companion to the high-level stub at
+> Status: **Phase 1 implemented and shipped behind the `ttl` feature flag.**
+> Phase 2 (per-policy embedded `expires_at`, `ConcurrentExpiring`, timer-wheel
+> swap-in, `serde`) is deferred. Companion to the implementation tracker at
 > [`docs/policies/roadmap/ttl.md`](../policies/roadmap/ttl.md).
 
 TTL is **not** a replacement policy; it is an expiration rule that coexists
-with an eviction policy. This document explores how TTL can be introduced into
-`cachekit` while preserving the project's invariants:
+with an eviction policy. This document captures the rationale behind how TTL
+is wired into `cachekit` and preserves the project's invariants:
 
 - policy ↔ storage separation (see [`src/store/traits.rs`](../../src/store/traits.rs))
 - allocation-free hot paths
@@ -14,21 +16,47 @@ with an eviction policy. This document explores how TTL can be introduced into
 
 ## Current State
 
-- No TTL exists in source today (`rg ttl|expir|Instant` finds only docs and
-  benchmark labels).
-- A high-level stub already exists at
-  [`docs/policies/roadmap/ttl.md`](../policies/roadmap/ttl.md).
-- The `ds::LazyMinHeap` primitive at [`src/ds/lazy_heap.rs`](../../src/ds/lazy_heap.rs)
-  explicitly lists "TTL expiry heaps" as a use case.
-- The capability-trait pattern at [`src/traits.rs`](../../src/traits.rs)
-  (`RecencyTracking`, `FrequencyTracking`, `HistoryTracking`) gives a clean
-  injection point for an `ExpiringCache` trait.
-- The runtime-policy enum at [`src/builder.rs`](../../src/builder.rs)
-  (`DynCache` / `CacheInner`) makes a TTL wrapper composable in one variant
-  rather than 18 per-policy edits. Note: `CacheInner` currently wires
-  17 of the 18 policy modules under `src/policy/`; `policy::car` is not
-  yet a variant. Closing that gap is a prerequisite for "TTL works for
-  every policy via `DynCache`".
+Phase 1 is fully landed behind the `ttl` feature flag
+([`Cargo.toml`](../../Cargo.toml)):
+
+- [`src/time.rs`](../../src/time.rs) — `Clock` trait, `StdClock`, `MockClock`
+  (ms ticks; `&self`; `Send + Sync`; saturating arithmetic).
+- [`src/ds/expiration_index.rs`](../../src/ds/expiration_index.rs) —
+  `ExpirationIndex<K>` wrapping `LazyMinHeap<K, u64>` with auto-rebuild and
+  TTL-specialised operations (`set_deadline`, `next_deadline`, `pop_expired`,
+  `drain_expired`).
+- [`src/ds/lazy_heap.rs`](../../src/ds/lazy_heap.rs) — added
+  `LazyMinHeap::peek_best` so `ExpirationIndex` reads stale-tombstoned
+  roots in place without touching heap internals.
+- [`src/traits.rs`](../../src/traits.rs) — `Tick`, `TtlStatus`, and the
+  `ExpiringCache<K, V>` capability trait alongside `RecencyTracking` etc.
+- [`src/policy/expiring.rs`](../../src/policy/expiring.rs) —
+  `Expiring<C, K, V, T = StdClock>` decorator with the ordering invariant
+  and logical-read/physical-purge split.
+- [`src/builder.rs`](../../src/builder.rs) — `impl Cache<K, V>` for
+  `DynCache<K, V>`, `CacheBuilder::with_default_ttl(Duration)` returning
+  `ExpiringBuilder`, and `DynExpiringCache<K, V>` (private-constructor
+  wrapper around `Expiring<DynCache<K, V>, _, _, StdClock>`).
+- [`src/prelude.rs`](../../src/prelude.rs) — re-exports `Clock`,
+  `StdClock`, `MockClock`, `Tick`, `TtlStatus`, `ExpiringCache`,
+  `DynExpiringCache`, and `ExpiringBuilder` under the `ttl` feature.
+- [`tests/ttl_integration_test.rs`](../../tests/ttl_integration_test.rs) —
+  integration tests plus proptest invariants over expiry order.
+- [`benches/ttl_overhead.rs`](../../benches/ttl_overhead.rs) — Zipfian /
+  scan / mixed workloads comparing `LruCore<u64, u64>` against
+  `Expiring<LruCore<u64, u64>>` with a 60-second default TTL.
+
+Phase 2 has **not** landed:
+
+- Per-policy embedded `expires_at: u64` in `LruCore::Node` /
+  `S3FifoCache::Node` (gated on bench results from Phase 1).
+- `ConcurrentExpiring<C>` returning owned/`Arc<V>` values.
+- Timer-wheel swap-in for `ExpirationIndex`.
+- `serde` support for `Tick` / `TtlStatus`.
+
+`CacheInner` still wires only 17 of the 18 policy modules under
+`src/policy/`; `policy::car` is gated by `policy-car` but absent from
+`DynCache`. TTL-via-builder for CAR remains gated on closing that gap.
 
 ---
 
@@ -36,21 +64,25 @@ with an eviction policy. This document explores how TTL can be introduced into
 
 Five viable patterns, ordered roughly by invasiveness.
 
-### a) Decorator / wrapper cache
+### a) Decorator / wrapper cache (chosen)
 
-A new struct `Expiring<C>` owns an inner `C: Cache<K, V>` plus a per-key
-expiry index, intercepting `get` / `peek` / `insert` / `remove` to consult
-the index.
+A new struct `Expiring<C, K, V, T>` owns an inner `C: Cache<K, V>` plus a
+per-key expiry index, intercepting `get` / `peek` / `insert` / `remove` to
+consult the index. This is what shipped — see
+[`src/policy/expiring.rs`](../../src/policy/expiring.rs).
 
 ```rust
-pub struct Expiring<C, K, T = StdClock> {
+pub struct Expiring<C, K, V, T = StdClock> {
     inner: C,
     index: ExpirationIndex<K>,
     clock: T,
-    default_ttl: Option<Duration>,
+    default_ttl_ticks: Option<u64>,
+    #[cfg(feature = "metrics")]
+    expirations: u64,
+    _value: PhantomData<fn() -> V>,
 }
 
-impl<C, K, V, T> Cache<K, V> for Expiring<C, K, T>
+impl<C, K, V, T> Cache<K, V> for Expiring<C, K, V, T>
 where
     C: Cache<K, V>,
     K: Eq + Hash + Clone,
@@ -60,7 +92,9 @@ where
 
 `K` must appear as a type parameter on the wrapper itself because the index
 is keyed by `K`; threading it only through the `Cache<K, V>` impl is not
-enough.
+enough. `V` is recorded as `PhantomData<fn() -> V>` so the inner cache's
+value type is fixed at construction without dragging `V` into auto-trait
+bounds.
 
 - **Pros:** zero churn on the 18 existing policies; opt-in; composes with
   `DynCache`; matches the policy/storage separation rule in `.cursorrules`.
@@ -80,15 +114,15 @@ enough.
 - **Important semantic constraint:** today's [`Cache`](../../src/traits.rs)
   trait has `peek`, `contains`, and `len` as `&self` methods. A decorator
   cannot physically remove expired entries from those methods unless it adds
-  interior mutability. The first slice should define them as *logical* reads:
-  expired entries are invisible to `peek` / `contains`, while physical cleanup
-  happens on `get`, `insert`, `remove`, `clear`, or explicit
-  `purge_expired`. **Decision:** `Cache::len` returns physical occupancy
-  — it is cheaper, matches the underlying cache trait, and is the only
-  thing implementable through `&self`. Surprise after time advances is
-  mitigated by exposing `Expiring::live_len(&mut self) -> usize` as an
-  inherent method on the wrapper, which can amortize an internal sweep.
-  Document the distinction in both rustdocs.
+  interior mutability. Shipped behaviour: `peek` and `contains` are
+  *logical* reads — expired entries are invisible to them, while physical
+  cleanup happens on `get`, `insert`, `remove`, `clear`, or explicit
+  `purge_expired`. **Decision (shipped):** `Cache::len` returns physical
+  occupancy. `Expiring::live_len(&self) -> usize` is exposed as an
+  inherent method that walks the expiration index once (O(n), no
+  allocation, `&self` — no internal mutation needed because
+  `ExpirationIndex::iter` is borrow-only). The distinction is documented
+  on both `Cache::len` and `Expiring::live_len`.
 - **Mutation semantics:** expired-but-resident entries should behave as
   logically missing. `get` / `remove` should purge and return `None`;
   `insert` / `insert_with_ttl` should purge the stale value before inserting
@@ -157,23 +191,32 @@ A `Box<dyn ExpirationIndex>` injected into any policy. Conflicts with the
 .cursorrules guidance to "minimize Arc usage in hot paths" and "avoid heavy
 Rust ergonomics in hot loops (trait objects, …)".
 
-### Recommendation
+### Recommendation (shipped as Phase 1)
+
+(a) was shipped as the `ttl` feature, and the wrapper deliberately offers
+*logical* expiration over the current `Cache` trait rather than a
+zero-overhead embedded TTL.
+
+For builder integration, the shipped approach is a **hybrid of options
+(1) and (2)** from §4(c):
 
-Ship (a) first as a `ttl` feature, but be explicit that the wrapper gives
-logical expiration over the current `Cache` trait rather than a zero-overhead
-embedded TTL.
+- `DynCache<K, V>` got an `impl Cache<K, V>` so `Expiring<DynCache, …>`
+  type-checks (option 1's mechanism — single delegation, no per-policy
+  match-arm duplication outside the existing `DynCache` boilerplate).
+- A separate public type `DynExpiringCache<K, V>` wraps
+  `Expiring<DynCache<K, V>, K, V, StdClock>` with the inner field
+  **private**; its only constructor is `ExpiringBuilder::build`
+  (option 2's surface — no public way to feed a `DynExpiringCache` back
+  into an `Expiring`).
 
-For builder integration, prefer a **separate `DynExpiringCache<K, V>`
-type** returned by a TTL-specific builder path over implementing
-`Cache<K, V>` for `DynCache<K, V>` and wrapping it. The decisive reason
-is that the first option permits `Expiring<Expiring<DynCache>>` to type-
-check — two clocks, two indexes, surprising semantics — and we have no
-clean way to disallow it at the type level once `DynCache: Cache`. A
-distinct expiring type makes double-wrapping impossible by construction.
-The cost is one extra public type and minor delegation boilerplate; the
-benefit is that the only TTL surface is the one the builder hands out.
+`DynExpiringCache` does **not** implement `Cache<K, V>`. Because
+`Expiring<C>` requires `C: Cache<K, V>`,
+`Expiring<DynExpiringCache>` is structurally unrepresentable, which
+restores the "one TTL layer" guarantee that pure option (1) couldn't
+give without a runtime check. The cost is one extra public type plus
+a thin mirror of `DynCache`'s inherent methods.
 
-Then, where profiling justifies it, embed `expires_at` into specific
+Phase 2: where profiling justifies it, embed `expires_at` into specific
 policies (b) — LRU, FastLRU and S3-FIFO are the high-value targets. The
 embed must be opt-in per-node so non-TTL users do not pay 8 bytes per
 entry (see §6, step 7).
@@ -206,53 +249,53 @@ The codebase already owns most of the building blocks: `SlotArena`,
 `LazyMinHeap`, `IntrusiveList`, `GhostList`, `ClockRing`. Concrete options
 for the expiration index follow.
 
-### A) Lazy min-heap of `(expires_at, key)`
+### A) Lazy min-heap of `(expires_at, key)` (shipped)
 
-`ds::LazyMinHeap<K, S>` already exists at
-[`src/ds/lazy_heap.rs`](../../src/ds/lazy_heap.rs) and explicitly lists TTL
+`ds::LazyMinHeap<K, S>` already existed at
+[`src/ds/lazy_heap.rs`](../../src/ds/lazy_heap.rs) and explicitly listed TTL
 in its use cases. Insertion is O(log n); `pop_best` is amortized O(log n);
 `update` is O(log n) with `maybe_rebuild` to bound staleness.
 
-Used as a TTL index, this needs a thin `ExpirationIndex` wrapper over
-`LazyMinHeap` rather than using the heap directly. The wrapper should expose:
+The shipped `ExpirationIndex` wrapper lives at
+[`src/ds/expiration_index.rs`](../../src/ds/expiration_index.rs):
 
 ```rust
-pub struct ExpirationIndex<K> { /* LazyMinHeap<K, u64> */ }
-
-impl<K> ExpirationIndex<K> {
-    pub fn set_deadline(&mut self, key: K, expires_at: u64) -> Option<u64> {
-        /* ... */
-    }
-
-    pub fn remove<Q>(&mut self, key: &Q) -> Option<u64>
-    where
-        K: Borrow<Q>,
-        Q: Hash + Eq + ?Sized,
-    {
-        /* ... */
-    }
-
-    pub fn peek_deadline(&mut self) -> Option<(&K, u64)> {
-        /* ... */
-    }
-
-    pub fn pop_expired(&mut self, now: u64) -> Option<(K, u64)> {
-        /* ... */
-    }
+pub type Deadline = u64;
+
+pub struct ExpirationIndex<K> { /* LazyMinHeap<K, Deadline> */ }
+
+impl<K: Eq + Hash + Clone> ExpirationIndex<K> {
+    pub fn new() -> Self;
+    pub fn with_capacity(capacity: usize) -> Self;
+
+    pub fn set_deadline(&mut self, key: K, expires_at: Deadline) -> Option<Deadline>;
+    pub fn deadline_of<Q>(&self, key: &Q) -> Option<Deadline>
+    where K: Borrow<Q>, Q: Hash + Eq + ?Sized;
+    pub fn remove<Q>(&mut self, key: &Q) -> Option<Deadline>
+    where K: Borrow<Q>, Q: Hash + Eq + ?Sized;
+
+    pub fn next_deadline(&mut self) -> Option<(&K, Deadline)>;
+    pub fn pop_expired(&mut self, now: Deadline) -> Option<(K, Deadline)>;
+    pub fn drain_expired(&mut self, now: Deadline)
+        -> impl Iterator<Item = (K, Deadline)> + '_;
+
+    pub fn iter(&self) -> Iter<'_, K>;
+    pub fn set_auto_rebuild(&mut self, factor: Option<usize>) -> &mut Self;
 }
 ```
 
-`LazyMinHeap` currently has destructive `pop_best` but no non-destructive
-live-minimum peek (verified against [`src/ds/lazy_heap.rs`](../../src/ds/lazy_heap.rs):
-only `update`, `pop_best`, `with_auto_rebuild`, `maybe_rebuild` are public).
-The first slice should **add a `peek_best` primitive to `LazyMinHeap`**
-rather than reimplementing live-minimum logic inside `ExpirationIndex`.
-The wrapper approach would have to inspect the heap's internal staleness
-state to skip popped entries, which couples `ExpirationIndex` to
-`LazyMinHeap`'s representation. A `peek_best(&mut self) -> Option<(&K, &S)>`
+Auto-rebuild defaults to factor 2 — stale heap entries are bounded at
+`2 * live_len()` — and callers that mutate every entry many times per
+epoch can tighten this via `set_auto_rebuild`.
+
+`LazyMinHeap` had destructive `pop_best` but no non-destructive
+live-minimum peek. Phase 1 added a
+`peek_best(&mut self) -> Option<(&K, &S)>` primitive to `LazyMinHeap`
 that drains stale-tombstoned roots in place (mutating because lazy
 deletion may need to advance past tombstones, immutable observation
-otherwise) is the right primitive and is reusable outside TTL.
+otherwise). `ExpirationIndex::next_deadline` is a thin wrapper around
+it, so the wrapper does not couple to the heap's internal staleness
+representation. The primitive is reusable outside TTL.
 
 - **Pros:** smallest delta — reuse an existing primitive, single allocation
   pool, no clock-tick budget.
@@ -366,8 +409,9 @@ pub trait Clock {
   replaced value. Replacing a live entry returns `Some(old_value)`; replacing
   an expired resident entry purges it first and returns `None`.
 - **Periodic (or on insert when full):**
-  while `peek_deadline()` returns a deadline `<= now`, call
-  `pop_expired(now)` and remove that key from the wrapped cache.
+  while `ExpirationIndex::next_deadline()` returns a deadline `<= now`,
+  call `pop_expired(now)` and remove that key from the wrapped cache.
+  This is exactly what `Expiring::purge_expired` does today.
 - **Eviction precedence:** "evict expired first, then policy victim" — the
   rule already documented in
   [`docs/policies/roadmap/ttl.md`](../policies/roadmap/ttl.md).
@@ -427,69 +471,89 @@ override" pattern is ambiguous in user code.
 
 ### b) `Clock` abstraction for testability
 
-A `Clock` parameter on `Expiring<C, K, T>` (default `StdClock`) and on any
-policy that embeds expiry. Mirrors how `RandomCore` keeps `rng_state`
-rather than calling `rand::thread_rng()` directly.
+A `Clock` parameter on `Expiring<C, K, V, T>` (default `StdClock`) and on
+any policy that embeds expiry. Mirrors how `RandomCore` keeps `rng_state`
+rather than calling `rand::thread_rng()` directly. The shipped
+[`Clock`](../../src/time.rs) trait is object-safe and requires
+`Send + Sync + Debug`, with `StdClock(Instant)` and `MockClock(AtomicU64)`
+covering production and test cases respectively.
 
-### c) Builder integration
+### c) Builder integration (shipped)
 
-Two complementary surfaces:
+The shipped surface:
 
 ```rust
+use cachekit::builder::{CacheBuilder, CachePolicy};
+use std::time::Duration;
+
 let mut cache = CacheBuilder::new(1000)
     .with_default_ttl(Duration::from_secs(60))
     .build::<u64, String>(CachePolicy::Lru);
 
-cache.insert_with_ttl(1, v, Duration::from_secs(5));
+cache.insert(1, "v".to_string());                              // uses default TTL
+cache.insert_with_ttl(2, "fast".into(), Duration::from_secs(5)); // overrides
 ```
 
-Internally, `with_default_ttl(Some(_))` switches the builder to produce
-a `DynExpiringCache<K, V>` (separate public type) rather than a
-`DynCache<K, V>`. The two paths considered were:
-
-1. Add `impl Cache<K, V> for DynCache<K, V>` and store
-   `CacheInner::Ttl(Expiring<BoxedOrDynCache<K, V>>)` / an equivalent wrapper
-   around the already-built `DynCache`.
-2. Introduce a separate `DynExpiringCache<K, V>` returned by a TTL-specific
-   builder path, avoiding a recursive enum at the cost of another public type.
-
-Option (2) is recommended (see §1 Recommendation). The deciding factor is
-that option (1) lets `Expiring<Expiring<DynCache>>` type-check, which is
-silently wrong (two clocks, two indexes). Option (2) makes double-
-wrapping unrepresentable: `Expiring<C>` is only constructed through the
-builder, and the builder never returns an inner expiring cache. The
-document's "one new variant, not 18" goal still holds — the new type
-delegates `Cache::insert` etc. via a single match arm per inner policy,
-identical to the existing `DynCache` boilerplate but with the expiry
-check threaded through. The duplication is real but bounded.
-
-### d) Feature gating
-
-A `ttl` feature; `chrono` is already a dev-dep (see [`Cargo.toml`](../../Cargo.toml)),
-but TTL itself should depend only on `std::time` and the existing
-`LazyMinHeap` / `SlotArena`. The `ExpirationIndex` lives at
-`src/ds/expiration_index.rs` but is gated behind `#[cfg(feature = "ttl")]`
-so the `ds` module does not grow a time abstraction when TTL is disabled.
-The new `Clock` trait at `src/time.rs` is similarly gated. Keep `metrics`
-integration lightweight: extend `LruMetrics` / `StoreMetrics` with
-`expirations: u64` behind `metrics`, similar to how `evictions` is tracked
-today (see [`src/store/traits.rs`](../../src/store/traits.rs) `StoreMetrics`).
-
-### e) Concurrent variants
+`CacheBuilder::with_default_ttl(Duration)` returns an `ExpiringBuilder`
+whose `build::<K, V>(policy)` returns a `DynExpiringCache<K, V>` rather
+than a `DynCache<K, V>`. The two paths considered were:
+
+1. Add `impl Cache<K, V> for DynCache<K, V>` and wrap the result in
+   `Expiring<DynCache<K, V>, …>` directly.
+2. Introduce a separate `DynExpiringCache<K, V>` returned by a
+   TTL-specific builder path, avoiding a recursive enum at the cost of
+   another public type.
+
+The shipped code uses the **mechanism of (1) plus the surface of (2)**.
+`DynCache<K, V>` now implements `Cache<K, V>` (see
+[`builder.rs`](../../src/builder.rs)), so `Expiring<DynCache, …>` is
+internally type-checkable. `DynExpiringCache<K, V>` wraps that
+`Expiring<DynCache<K, V>, K, V, StdClock>` with the inner field
+**private**, and the `Cache` trait is **not** implemented on the public
+wrapper — therefore `Expiring<DynExpiringCache>` is structurally
+unrepresentable. The "one new variant, not 18" goal holds: the new type
+mirrors `DynCache`'s inherent methods (one delegation per method, not
+per policy) and adds the TTL surface.
+
+### d) Feature gating (shipped)
+
+A `ttl` feature lives in [`Cargo.toml`](../../Cargo.toml). TTL depends
+only on `std::time` and the existing `LazyMinHeap` — no new runtime
+dependency. The following modules are gated behind
+`#[cfg(feature = "ttl")]`:
+
+- `src/time.rs` (`Clock`, `StdClock`, `MockClock`, `duration_to_ticks`)
+- `src/ds/expiration_index.rs` (`ExpirationIndex`, `Deadline`, iters)
+- `src/policy/expiring.rs` (`Expiring`)
+- `src/builder.rs::ttl_support` (`ExpiringBuilder`, `DynExpiringCache`)
+- `Tick` / `TtlStatus` / `ExpiringCache` items in `src/traits.rs`
+
+When `ttl` is off the `ds` module does not grow a time abstraction and
+`DynCache` reverts to its original surface.
+
+Metrics integration is a single counter — `Expiring::expirations: u64`
+behind `#[cfg(feature = "metrics")]`, with an accessor that returns 0
+when the feature is off so call sites compile unconditionally. Per-policy
+metrics structures (`LruMetrics`, `StoreMetrics`) were intentionally
+left alone — the decorator-owned counter is enough for Phase 1 because
+every TTL purge funnels through the wrapper.
+
+### e) Concurrent variants (Phase 2)
 
 The existing `Concurrent*` wrappers (`ConcurrentLruCache` in
 [`src/policy/lru.rs`](../../src/policy/lru.rs), `ConcurrentSlotArena` in
 [`src/ds/slot_arena.rs`](../../src/ds/slot_arena.rs),
 `ConcurrentClockRing` in [`src/ds/clock_ring.rs`](../../src/ds/clock_ring.rs))
-wrap their single-threaded core in `parking_lot::RwLock`. TTL follows that
-shape with two non-negotiable rules:
+wrap their single-threaded core in `parking_lot::RwLock`.
+`ConcurrentExpiring<C>` is **not** shipped yet; when it lands it must
+follow two non-negotiable rules:
 
 1. **Return owned/`Arc<V>`, not `&V`.** The `Cache::get(&mut self) -> Option<&V>`
    signature cannot be implemented safely on `ConcurrentExpiring<C>`
    without holding the write lock across the borrow, which serializes
    readers and defeats the point of `RwLock`. `ConcurrentExpiring<C>`
-   therefore exposes `fn get(&self, key: &K) -> Option<Arc<V>>` (and the
-   sibling mutators), and does **not** implement `Cache<K, V>`. It is a
+   should expose `fn get(&self, key: &K) -> Option<Arc<V>>` (and the
+   sibling mutators), and should **not** implement `Cache<K, V>`. It is a
    concrete type with its own API, mirroring how `ConcurrentLruCache`
    already deviates from `Cache<K, V>`.
 2. **Atomic expiry-and-removal.** The expiry check, policy removal, and
@@ -500,19 +564,33 @@ shape with two non-negotiable rules:
    read lock, escalate to write lock for the actual removal) is safe so
    long as the write-locked path re-checks the deadline before acting.
 
-### f) `DynCache` touchpoint
+Today, callers needing thread-safety wrap a `DynExpiringCache` in
+`Arc<RwLock<…>>` exactly as they would for any other `Cache`.
 
-With the `DynExpiringCache<K, V>` route chosen in §4(c), `DynCache` itself
-is **untouched** by TTL. The new type lives next to `DynCache` and mirrors
-its match-arm boilerplate one level out (the expiry check happens before
-delegating to the inner policy's `Cache::insert`/`get`/etc.). The `Debug`
-impl on `DynExpiringCache` should report TTL mode (default TTL, clock
-type) without exposing keys or deadlines.
+### f) `DynCache` touchpoint (shipped)
 
-### g) `prelude.rs`
+`DynCache<K, V>` gained an `impl Cache<K, V>` so that
+`Expiring<DynCache, K, V, StdClock>` type-checks — see
+[`src/builder.rs`](../../src/builder.rs). All other policy structs
+already implemented `Cache<K, V>`; this fills the last gap so the
+decorator works through the runtime-selected enum without a per-policy
+match arm.
 
-Re-export `ExpiringCache`, `Clock`, `StdClock`, `Expiring` so users get them
-via `use cachekit::prelude::*;`.
+`DynExpiringCache<K, V>` (also in `builder.rs`) lives next to `DynCache`
+and mirrors `DynCache`'s inherent methods plus the TTL surface
+(`insert_with_ttl`, `ttl_status`, `set_ttl`, `purge_expired`,
+`live_len`, `default_ttl`, `expirations`). Its `Debug` impl reports
+`default_ttl`, `len`, and `capacity` via `finish_non_exhaustive` —
+no keys or deadlines leak.
+
+### g) `prelude.rs` (shipped)
+
+[`src/prelude.rs`](../../src/prelude.rs) re-exports `Clock`, `StdClock`,
+`MockClock`, `Tick`, `TtlStatus`, `ExpiringCache`, `DynExpiringCache`,
+and `ExpiringBuilder` under `#[cfg(feature = "ttl")]`. `Expiring` itself
+is not re-exported — callers reach it through
+`cachekit::policy::expiring::Expiring` when they need the raw decorator,
+but most code should consume the builder-vended `DynExpiringCache`.
 
 ---
 
@@ -592,92 +670,145 @@ via `use cachekit::prelude::*;`.
 
 ---
 
-## 6. Recommended First Slice
-
-A pragmatic phased roadmap:
-
-1. New module `src/policy/expiring.rs` with `Expiring<C, K, T = StdClock>`
-   decorator. Define `peek` / `contains` as logical reads; `Cache::len`
-   reports physical occupancy; add an inherent `Expiring::live_len(&mut self)`
-   for callers that need the live count (see §1(a) Decision).
-2. New ds module `src/ds/expiration_index.rs` backed by
-   `LazyMinHeap<K, u64>` (cheap reuse) with auto-rebuild enabled to bound
-   stale heap growth. Add a `peek_best` primitive to `LazyMinHeap`
-   itself (see §3.A) so `ExpirationIndex` can implement
-   `peek_deadline` / `pop_expired(now)` without coupling to heap
-   internals. Leave the door open to swap in a timer wheel later. Both
-   files are gated behind `#[cfg(feature = "ttl")]`.
-3. `Clock` trait + `StdClock` / `MockClock` in a new `src/time.rs`.
-4. `ExpiringCache<K, V>` capability trait in `src/traits.rs`, using
-   `TtlStatus` for unambiguous status reporting.
-5. `CacheBuilder::with_default_ttl(Duration)` returns a separate
-   `DynExpiringCache<K, V>` (not `DynCache<K, V>`) — see §1 Recommendation
-   and §4(c). This makes `Expiring<Expiring<…>>` structurally
-   unrepresentable.
-6. Feature flag `ttl`; metrics field `expirations` (gated on `metrics`);
-   doctests + a fuzz seed; benchmark group `ttl_overhead` that compares
-   plain LRU vs. `Expiring<LRU>` under the existing Zipfian and scan
+## 6. Phased Roadmap
+
+### Phase 1 — landed
+
+All six items below are merged behind the `ttl` feature flag. The
+shipped sequence preserves policy/storage separation and keeps TTL
+opt-in, but the decorator does not preserve every hot-path invariant:
+inserts pay heap maintenance, the index clones keys, and expired entries
+may remain physically resident until a mutable operation purges them.
+The Phase 2 benchmark gate (step 7) is therefore part of the design,
+not optional cleanup.
+
+1. `src/policy/expiring.rs` — `Expiring<C, K, V, T = StdClock>`
+   decorator. `peek` / `contains` are logical reads; `Cache::len`
+   reports physical occupancy; `Expiring::live_len(&self)` walks the
+   index once for the live count (see §1(a) Decision). The shipped
+   signature is `&self`, not `&mut self`, because
+   `ExpirationIndex::iter` is borrow-only.
+2. `src/ds/expiration_index.rs` — `ExpirationIndex<K>` backed by
+   `LazyMinHeap<K, u64>` with auto-rebuild enabled (factor 2 by
+   default) to bound stale heap growth. The door is open to swap in a
+   timer wheel later. Added a `peek_best` primitive to `LazyMinHeap`
+   itself (see §3.A) so `ExpirationIndex::next_deadline` /
+   `pop_expired` / `drain_expired` do not couple to heap internals.
+3. `src/time.rs` — `Clock` trait + `StdClock` / `MockClock`. `Clock`
+   takes `&self`, is object-safe, and requires `Send + Sync + Debug`.
+4. `src/traits.rs` — `Tick`, `TtlStatus`, and the `ExpiringCache<K, V>`
+   capability trait (object-safe; sits alongside `RecencyTracking`,
+   `FrequencyTracking`, `HistoryTracking`).
+5. `CacheBuilder::with_default_ttl(Duration)` returns an
+   `ExpiringBuilder` whose `build` produces a `DynExpiringCache<K, V>`
+   (separate public type with a private inner field; cannot be wrapped
+   in another `Expiring`). To make this work without per-policy
+   plumbing, `DynCache<K, V>` gained an `impl Cache<K, V>` — see §1
+   Recommendation and §4(c).
+6. Feature flag `ttl`; counter `Expiring::expirations` (gated on
+   `metrics`, accessor returns 0 unconditionally so call sites
+   compile); doctests on every public item;
+   [`tests/ttl_integration_test.rs`](../../tests/ttl_integration_test.rs)
+   exercises the decorator and the builder path under proptest;
+   [`benches/ttl_overhead.rs`](../../benches/ttl_overhead.rs) compares
+   plain LRU vs. `Expiring<LRU>` under Zipfian / scan / mixed
    workloads.
-7. **Phase 2:** profile (a) and, if the extra hash hit shows up in
+
+### Phase 2 — deferred
+
+7. **Embedded `expires_at`.** Profile Phase 1 with the existing
+   `ttl_overhead` group and, if the extra hash hit shows up in
    flamegraphs, embed `expires_at: u64` into `LruCore::Node` and
-   `S3FifoCache::Node` (the two highest-traffic policies in the existing
-   benches at [`benches/`](../../benches)) — but **opt-in per node**, not
-   unconditionally. Two viable shapes:
+   `S3FifoCache::Node` (the two highest-traffic policies in the
+   existing benches at [`benches/`](../../benches)) — but **opt-in per
+   node**, not unconditionally. Two viable shapes:
    - A const generic `Node<K, V, const TTL: bool>` so non-TTL caches
      monomorphize to the slimmer layout.
    - A separate type `LruWithTtl<K, V>` (and `S3FifoWithTtl<K, V>`)
-     that wraps the slot arena with a parallel `Vec<u64>` keyed by slot
-     handle.
+     that wraps the slot arena with a parallel `Vec<u64>` keyed by
+     slot handle.
+
    Embedding `expires_at` unconditionally would add 8 bytes per node to
    every LRU and S3-FIFO instance — a 10–25% memory regression for the
    common case of fixed-size value caches — and would regress the very
-   benchmarks step 6 is using as a gate. The `.cursorrules` "keep
-   metadata tight" rule applies here.
-
-This sequence preserves policy/storage separation and keeps TTL opt-in, but
-the decorator does not preserve every hot-path invariant: inserts pay heap
-maintenance, the index clones keys, and expired entries may remain physically
-resident until a mutable operation purges them. The benchmark gate in step 6
-is therefore part of the design, not optional cleanup.
+   benchmarks Phase 1 uses as a gate. The `.cursorrules` "keep metadata
+   tight" rule applies here.
+8. **`ConcurrentExpiring<C>`** following §4(e) (owned/`Arc<V>` return,
+   atomic expiry-and-removal).
+9. **Timer-wheel swap-in** as an alternative `ExpirationIndex` backend
+   when TTL is uniform and high-throughput (see §3.B/§3.C).
+10. **`serde`** support for `Tick` / `TtlStatus`, with the relative-
+    duration serialization rule from §5 (API trade-offs).
+11. **CAR in `DynCache`.** TTL-via-builder for CAR is blocked until
+    `policy::car` becomes a `CacheInner` variant.
 
 ---
 
 ## 7. Open Questions
 
-- Should `purge_expired` be exposed publicly, run on a background thread,
-  triggered on insert-when-full, or all three (configurable)?
-- Should the `Clock` trait live in a top-level `time` module or inside `ds`?
-  Step 6.3 currently picks `src/time.rs`; revisit if `no_std` support
-  becomes a constraint.
+Still open:
+
+- Should `purge_expired` only be exposed publicly (as today), or also
+  fire on a background thread or on insert-when-full? Phase 1 ships the
+  pull-only API and lets the caller drive cadence; configurable
+  background sweepers may follow once we have data on what's actually
+  needed.
 - How should serialization (under `serde` feature) handle `expires_at` —
-  the current recommendation is relative remaining duration, but restoring
-  long-lived caches may need wall-clock deadlines. Open until a
-  serialization API is proposed.
-- Is there demand for *negative* TTL (entries that become valid only after
-  a delay)? Probably no, but worth confirming before locking the API.
-- Should `purge_expired` return a `usize` count, the evicted `(K, V)`
-  pairs, or both (via separate methods)? The current trait sketch returns
-  `usize`; users who need the values can iterate `pop_expired` directly
-  through a lower-level API.
-
-Resolved during this design pass (kept here for posterity):
-- `len` reports physical occupancy (matches `Cache::len`'s `&self`
-  constraint); add `live_len(&mut self)` if/when the wrapper grows a
-  mutable counterpart — see §1(a).
-- Builder integration uses a separate `DynExpiringCache<K, V>` rather
-  than `impl Cache for DynCache` — see §1 Recommendation and §4(c).
+  the current §5 recommendation is relative remaining duration, but
+  restoring long-lived caches may need wall-clock deadlines. Open
+  until a serialization API is proposed.
+- Is there demand for *negative* TTL (entries that become valid only
+  after a delay)? Probably no, but worth confirming before locking the
+  API.
+- Should `purge_expired` continue to return only a `usize` count, or
+  also offer a variant that yields the evicted `(K, V)` pairs? Phase 1
+  trait method returns `usize`; users who need the values can build on
+  the lower-level `ExpirationIndex::drain_expired` iterator paired with
+  manual cache removals.
+
+Resolved during the Phase 1 design pass (kept here for posterity):
+
+- `Cache::len` reports physical occupancy. `Expiring::live_len(&self)`
+  walks the index once to give the exact live count; `&self` suffices
+  because `ExpirationIndex::iter` is borrow-only — see §1(a).
+- Builder integration uses a separate `DynExpiringCache<K, V>` with a
+  private inner field plus an `impl Cache<K, V> for DynCache<K, V>`,
+  combining option (1)'s plumbing with option (2)'s surface — see §1
+  Recommendation and §4(c). `Expiring<DynExpiringCache>` is
+  structurally unrepresentable.
+- The `Clock` trait lives at `src/time.rs` (top-level), not inside
+  `ds`. Revisit if `no_std` support becomes a constraint.
+- `Tick` is exposed as a newtype rather than a bare `u64` — keeps the
+  tick unit (ms today) private to the `Clock` implementation.
 
 ---
 
 ## References
 
-- [`docs/policies/roadmap/ttl.md`](../policies/roadmap/ttl.md) — high-level
-  stub
+### Shipped source
+
+- [`src/time.rs`](../../src/time.rs) — `Clock`, `StdClock`, `MockClock`
+- [`src/ds/expiration_index.rs`](../../src/ds/expiration_index.rs) —
+  `ExpirationIndex`
+- [`src/ds/lazy_heap.rs`](../../src/ds/lazy_heap.rs) — `LazyMinHeap`
+  (with the Phase 1 `peek_best` addition)
+- [`src/traits.rs`](../../src/traits.rs) — `Tick`, `TtlStatus`,
+  `ExpiringCache`
+- [`src/policy/expiring.rs`](../../src/policy/expiring.rs) — `Expiring`
+  decorator
+- [`src/builder.rs`](../../src/builder.rs) — `impl Cache for DynCache`,
+  `ExpiringBuilder`, `DynExpiringCache`
+- [`src/prelude.rs`](../../src/prelude.rs) — re-exports
+- [`tests/ttl_integration_test.rs`](../../tests/ttl_integration_test.rs)
+- [`benches/ttl_overhead.rs`](../../benches/ttl_overhead.rs)
+
+### Supporting docs
+
+- [`docs/policies/roadmap/ttl.md`](../policies/roadmap/ttl.md) —
+  implementation tracker and Quick Start
 - [`docs/policy-ds/lazy-heap.md`](../policy-ds/lazy-heap.md) — lazy heap
   primitive used as the index
-- [`src/ds/lazy_heap.rs`](../../src/ds/lazy_heap.rs) — implementation that
-  already lists TTL as a use case
-- [`src/traits.rs`](../../src/traits.rs) — capability-trait pattern this
-  design extends
-- [`src/builder.rs`](../../src/builder.rs) — `DynCache` integration point
+
+### External
+
 - [Wikipedia: Cache replacement policies](https://en.wikipedia.org/wiki/Cache_replacement_policies)
diff --git a/docs/design/weighted-eviction.md b/docs/design/weighted-eviction.md
index 7a8dbac..d7f3c9a 100644
--- a/docs/design/weighted-eviction.md
+++ b/docs/design/weighted-eviction.md
@@ -379,6 +379,9 @@ store already follows.
 - [Cache trait hierarchy](trait-hierarchy.md) — future
   `WeightTracking` capability trait sketched in
   "Future capability traits"
+- [Storage layer](storage.md) — store trait family and the
+  rationale for `WeightStore`'s deliberate divergence from
+  `StoreCore` / `StoreMut`
 - [Stores](../stores/README.md) and [`weight.md`](../stores/weight.md)
   — reference docs for the runtime behaviour
 - [Error model](error-model.md) — `StoreFull` semantics
diff --git a/docs/index.md b/docs/index.md
index cbc9c14..07faa26 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -21,6 +21,7 @@ Key features:
 - [Benchmarking design](design/benchmarking.md) — Benchmark layers, policy registry, JSON artifacts
 - [Hashing and key identity](design/hashing.md) — Hasher choices, key interning, shard routing
 - [Sharding](design/sharding.md) — Sharded primitives, routing, capacity semantics
+- [Storage layer](design/storage.md) — Store trait family, concrete stores, `StoreMetrics` baseline
 - [Serialization](design/serialization.md) — `serde` surface and cache-state persistence boundaries
 - [Non-goals](design/non-goals.md) — Explicit boundaries and out-of-scope features
 - [TTL design](design/ttl.md) — Worked example of every principle in one feature
diff --git a/src/metrics/cell.rs b/src/metrics/cell.rs
index d72dac5..65012e6 100644
--- a/src/metrics/cell.rs
+++ b/src/metrics/cell.rs
@@ -2,8 +2,27 @@ use std::cell::Cell;
 
 /// A metrics-only cell backed by [`Cell<u64>`].
 ///
-/// All accesses must be externally synchronized (e.g. by an `RwLock`).
-/// This type is **not** safe for unsynchronized concurrent use.
+/// `MetricsCell` exists so a policy's `&self` read paths can record
+/// counters without forcing every embedding type to be `!Sync`. The
+/// `unsafe impl Sync` below is sound **only** under the contract
+/// documented on those `unsafe impl` blocks; callers that violate it
+/// produce a data race.
+///
+/// Soundness contract (mirrored in `docs/design/metrics.md`):
+///
+/// - Increments must happen under **exclusive** external synchronization
+///   (single-threaded, `&mut self`, behind a write lock, or behind a
+///   `Mutex`). A shared `RwLock::read` guard does **not** serialize
+///   readers and is **not** sufficient protection: concurrent `incr()`
+///   calls behind a read lock are a data race even though every
+///   individual increment uses a `Cell::set`.
+/// - For counters incremented from a path that is reachable through a
+///   shared read lock, use `AtomicU64` (or escalate to a write lock
+///   before recording) instead. `MetricsCell` is the wrong primitive
+///   for that path.
+/// - Approximation is acceptable for metrics; data races are not.
+///   "Best-effort observability" never justifies unsynchronized
+///   `Cell` mutation.
 #[repr(transparent)]
 #[derive(Debug, Default, Clone, PartialEq, Eq)]
 pub(crate) struct MetricsCell(Cell<u64>);
@@ -20,8 +39,14 @@ impl MetricsCell {
     }
 }
 
-// SAFETY:
-// All access to MetricsCell is externally synchronized by an RwLock.
-// Metrics are observational and do not affect correctness.
+// SAFETY: see the type-level "Soundness contract" doc comment above.
+// Callers must ensure that every `incr` / `get` happens under
+// exclusive external synchronization (single-threaded, `&mut self`,
+// or behind a write lock / `Mutex`). A shared `RwLock::read` guard is
+// not sufficient: multiple readers can race on the underlying `Cell`.
+// Counters reachable through a read-locked path must use `AtomicU64`
+// instead.
 unsafe impl Sync for MetricsCell {}
+// SAFETY: `Cell<u64>` is `Send` whenever `u64` is, and `MetricsCell`
+// adds no extra non-`Send` state.
 unsafe impl Send for MetricsCell {}