Skip to content

Implement JEPA-based memory compression framework#785

Open
Pshyam17 wants to merge 22 commits intoFactory-AI:mainfrom
Pshyam17:feat/jeval-memory-compression
Open

Implement JEPA-based memory compression framework#785
Pshyam17 wants to merge 22 commits intoFactory-AI:mainfrom
Pshyam17:feat/jeval-memory-compression

Conversation

@Pshyam17
Copy link

Introduce a comprehensive framework for JEPA-based semantic fidelity evaluation, including an abstract encoder interface, a frozen SentenceTransformer, and an EPE computation engine. The framework supports adaptive compression, classification, and budget allocation, enhancing memory management for Droid integration. Documentation and necessary files have been added to facilitate usage and development. Fixes include resolving circular imports and ensuring all necessary components are included.

Pshyam17 added 22 commits March 12, 2026 04:01
Defines the interface every encoder in jeval must satisfy.
EPEComputer depends only on this abstraction, never on a
concrete model class — makes encoder swapping free for ablations.
Wraps all-mpnet-base-v2 as the JEPA target encoder.
All parameters frozen unconditionally — a moving target
makes EPE uncalibrated. encode_chunked handles long
memory files without OOM.
Implements EPE = MSE(predictor(enc(C)), enc(T)) / 4.
/4 normalizes to [0,1] on the unit sphere.
Separate training_loss() and compute() paths make the
inference-time oracle pattern explicit.
Buckets segment-level EPE by Strata content type.
weighted_risk = sum(weight_t * mean_epe_t) is the scalar
the budget allocator acts on. align_abstractive() handles
non-1:1 mappings via nearest-neighbor cosine matching.
Routes memory segments to 6 semantic classes using
DeBERTa-v3-large. No labeled training data required.
NLI hypotheses are explicitly documented as a tunable
hyperparameter — classification accuracy is measurable
and improvable independently of EPE.
Maps weighted_risk → 3-tier compression budget per segment.
Thresholds calibrated conservatively as priors — empirical
calibration from probe eval results is the intended upgrade path.
Full pre-hoc compression pipeline: segment → classify →
EPE estimate → budget → apply. _apply_budget is a
word-truncation placeholder with a clean interface so
any LLM summarizer can be swapped in as the backend.
Intercepts Droid's PreCompact event, runs jeval adaptive
compression, writes back verified memories.md before Droid
sees it. Gracefully degrades to uncalibrated EPE if no
trained predictor checkpoint is found. Appends structured
JSONL audit log per compression event.
Reimplements Factory's Dec 2025 probe-based evaluation
methodology (recall, artifact, continuation, decision probes,
6-dimension LLM judge, 0-5 scale). Results are directly
comparable to their published baselines:
Factory 2.45 / Anthropic 2.33 / OpenAI 2.19 on artifact tracking.
README leads with the artifact tracking gap and shows the
full pipeline in ASCII. pyproject scoped to the example
folder so it installs independently of the main repo.
…import

decomposer imported from strata, strata/budget imported from
decomposer — circular. RISK_WEIGHTS now lives in its own module
that both can import from safely.
sed replaced only the first line of RISK_WEIGHTS dict,
leaving the remaining lines as dangling IndentationError.
Rewrote file cleanly with correct imports from epe.weights.
- swap word truncation for Mistral via NVIDIA NIM as compression backend
- add _llm_compress() with graceful fallback to word truncation if no API key
- replace hardcoded EPE thresholds with z-score normalization — thresholds
  are now relative to session EPE distribution, self-calibrating across
  any predictor version
- add artifact pattern detection in BudgetAllocator: entries containing
  file paths, JWT, Redis, API endpoints always get budget=1.0 regardless
  of EPE — fixes 'low EPE != low importance' for predictable content
- add trained predictor checkpoint (30 epochs, 5000 pairs, A100)
- add eval/train.py with synthetic Droid memory training data generator
- add test_data/real_session.md synthetic session for benchmark testing
- verify 10/10 critical artifacts survive 3 iterative compression rounds
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant