Skip to content

[codex] Close coding-deepgent MVP local agent harness core#220

Draft
kun1s2 wants to merge 111 commits intoshareAI-lab:mainfrom
kun1s2:codex/stage-12-14-context-compact-foundation
Draft

[codex] Close coding-deepgent MVP local agent harness core#220
kun1s2 wants to merge 111 commits intoshareAI-lab:mainfrom
kun1s2:codex/stage-12-14-context-compact-foundation

Conversation

@kun1s2
Copy link
Copy Markdown

@kun1s2 kun1s2 commented Apr 14, 2026

Summary

Closes the Approach A MVP for coding-deepgent: a local LangChain-native Agent Harness Core with explicit MVP/non-MVP boundaries and source-backed stage checkpoints through Stage 29.

This PR now includes the earlier context/compact/task/verifier work plus the MVP closeout work from Stages 18B-29:

  • verifier result persistence and evidence integration
  • evidence provenance and verifier lineage
  • canonical H01-H22 completion dashboard
  • H01-H12 MVP closeout (with H12 minimal only)
  • H15-H20 MVP closeout (with H17/H20 minimal only)
  • deferred-boundary ADR for H13/H14/H21/H22

MVP Status

Canonical dashboard result:

  • H01-H11: implemented
  • H12: implemented-minimal
  • H13: deferred
  • H14: deferred
  • H15-H19: implemented
  • H20: implemented-minimal
  • H21: deferred
  • H22: deferred

Stage 30-36 reserve is not currently required.

Major Additions

Context / compact / session / memory

  • Stage 12-16 context, recovery, compact, and session continuity foundations.
  • Stage 18B verifier outcomes persist into session evidence.
  • Stage 19 verifier evidence provenance and lineage.
  • Stage 22 prompt layering and dynamic context contract hardening.
  • Stage 23 projection and session continuity contract closeout.
  • Stage 24 scoped cross-session memory closeout.
  • Stage 28 whitelisted runtime event evidence persistence.

Durable workflow / subagents

  • Stage 17A-D durable task graph, plan artifacts, and verifier boundary.
  • Stage 18A real bounded verifier child-agent execution.
  • Stage 25 TodoWrite / durable task / plan-verify contract closeout.
  • Stage 26 bounded agent-as-tool MVP closeout and minimal context-thread propagation.

Extension platform

  • Stage 27 local skill, MCP, plugin-manifest, and hook contract closeout.
  • Plugins remain local manifest/source validation only in MVP.
  • Marketplace/install/update, remote trust, and remote hook platform remain deferred.

Planning / Trellis

  • Added the canonical H01-H22 completion dashboard.
  • Added stage PRDs/checkpoints for Stage 18B-29.
  • Updated project handoff for low-cost resume.
  • Recorded the MVP deferred-boundary ADR and release checklist.

Validation

Full current validation on coding-deepgent:

  • ruff check coding-deepgent/src coding-deepgent/tests
  • mypy coding-deepgent/src/coding_deepgent coding-deepgent/tests
  • pytest -q coding-deepgent/tests
    • 212 passed

Explicit Deferred / Out Of MVP

  • H13 mailbox / SendMessage multi-agent communication
  • H14 coordinator synthesis runtime
  • H21 bridge / remote / IDE control plane
  • H22 daemon / cron / proactive automation
  • Rich provider-specific fork/cache parity
  • Full plugin install/enable/update lifecycle
  • Provider-specific cost/cache instrumentation beyond local counters

Notes

This branch intentionally preserves the Trellis task/checkpoint artifacts that document the source-backed implementation and MVP closeout path.

CrazyBoyM and others added 30 commits April 8, 2026 05:45
The OMX team runtime writes local state under .omx/, and worker worktrees require the leader workspace to be clean before launch. Committing the ignore rule preserves local orchestration artifacts outside source control while unblocking durable team execution.

Constraint: omx team refuses to launch with a dirty leader workspace because it provisions worker worktrees
Rejected: Stash .gitignore before launch | would make .omx/ unignored again during team execution
Confidence: high
Scope-risk: narrow
Directive: Keep .omx/ ignored; do not remove unless replacing the OMX state location
Tested: git diff showed only .omx/ ignore addition
Not-tested: team launch after commit
The first LangChain milestone needs CI evidence that the parallel s01-s06 track exists, compiles without OpenAI credentials, avoids import-time model starts, and preserves visible teaching harness primitives. This adds the guardrail tests and wires CI through requirements.txt so later LangChain dependency additions are installed consistently.

Constraint: Test lane owns tests/CI while code lane still owns agents_langchain implementation

Confidence: medium

Scope-risk: narrow

Tested: python -m py_compile tests/test_langchain_agents_smoke.py; python -m pytest tests/test_agents_smoke.py -q

Not-tested: tests/test_langchain_agents_smoke.py passes only after agents_langchain s01-s06 code lane lands
The docs lane needs a stable comparison entry point before the code and test lanes are integrated, so this records where the s01-s06 LangChain/OpenAI-interface track lives, how it should be configured, and how reviewers should keep it separate from the original agents/ baseline and web UI.

Constraint: First milestone is s01-s06 only and must preserve agents/ plus web/ boundaries

Constraint: LangChain docs currently install core langchain plus langchain-openai for OpenAI integration

Rejected: Surface the track through web/ now | user explicitly scoped web UI/app out of this milestone

Confidence: high

Scope-risk: narrow

Tested: python -m pytest tests/test_agents_smoke.py -q; python -m compileall agents tests -q; git diff --check; python -m pip install --dry-run -r requirements.txt pytest

Not-tested: full pytest suite due pre-existing tests/test_s_full_background.py failure unrelated to docs/deps changes
The docs lane needs a stable comparison entry point before the code and test lanes are integrated, so this records where the s01-s06 LangChain/OpenAI-interface track lives, how it should be configured, and how reviewers should keep it separate from the original agents/ baseline and web UI.

Constraint: First milestone is s01-s06 only and must preserve agents/ plus web/ boundaries

Constraint: LangChain docs currently install core langchain plus langchain-openai for OpenAI integration

Rejected: Surface the track through web/ now | user explicitly scoped web UI/app out of this milestone

Confidence: high

Scope-risk: narrow

Tested: python -m pytest tests/test_agents_smoke.py -q; python -m compileall agents tests -q; git diff --check; python -m pip install --dry-run -r requirements.txt pytest

Not-tested: full pytest suite due pre-existing tests/test_s_full_background.py failure unrelated to docs/deps changes
Add a parallel agents_langchain s01-s06 track so learners can compare the existing hand-written Anthropic SDK baseline against LangChain's OpenAI-interface runtime without changing the web UI or original agents.

Constraint: First milestone is s01-s06 only and must preserve agents/*.py plus web/

Rejected: Put LangChain files under agents/ | risks confusing the existing web extractor and baseline teaching boundary

Confidence: high

Scope-risk: moderate

Tested: python -m py_compile agents_langchain/*.py; python -m pytest tests/test_agents_smoke.py tests/test_langchain_agents_smoke.py -q; env -u OPENAI_API_KEY import check for agents_langchain modules
The first LangChain milestone needs to sit beside the hand-written Anthropic SDK lessons, not replace them, so this adds a separate agents_langchain package, non-live smoke tests, OpenAI-style setup docs, and CI dependency wiring while leaving the web app and original s01-s06 scripts unchanged.

Constraint: Preserve existing agents/*.py as the baseline and avoid web UI/app changes for this milestone
Constraint: Automated tests must not require OPENAI_API_KEY or network access
Rejected: Put LangChain files under agents/ | would blur the baseline boundary and risk web extractor churn
Confidence: high
Scope-risk: moderate
Tested: python -m py_compile agents_langchain/*.py tests/test_langchain_agents_smoke.py
Tested: python -m pytest tests/test_agents_smoke.py tests/test_langchain_agents_smoke.py -q
Tested: env -u OPENAI_API_KEY python -m pytest tests/test_langchain_agents_smoke.py -q
Not-tested: Full pytest suite is blocked by pre-existing tests/test_s_full_background.py failure in unmodified agents/s_full.py
Not-tested: Live LangChain/OpenAI calls intentionally not run
The integrated LangChain milestone passed its targeted checks, but full repository pytest still failed in BackgroundManagerTests because a running background task with result=None rendered as '[running] None'. Normalizing the None case to the existing running placeholder keeps the capstone behavior aligned with the test and avoids a misleading status string.

Constraint: Full post-change verification should pass before concluding the milestone
Rejected: Leave the unrelated failure unresolved | would keep full pytest red at handoff time
Confidence: high
Scope-risk: narrow
Directive: Preserve the '(running)' placeholder contract for unfinished background tasks unless tests and user-visible output are updated together
Tested: python -m py_compile agents/s_full.py agents_langchain/*.py tests/test_langchain_agents_smoke.py; python -m pytest tests -q
Not-tested: Interactive manual run of agents/s_full.py background task commands
Kun added 6 commits April 12, 2026 09:26
The s06 alignment status is the document users are actively reading while deciding what CC behavior the chapter matches or intentionally omits. Translating it keeps the evidence/inference boundary accessible without changing the underlying implementation.

Constraint: User asked specifically to convert the s06 cc_alignment progress document to Chinese
Constraint: Preserve existing s06 alignment structure and verification claims
Rejected: Translate the whole cc_alignment directory now | request only covered the s06 progress document and other files have unrelated in-flight edits
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Keep future sNN alignment ledgers explicit about 已对齐、部分对齐/教学等价、未对齐/有意不复制、测试证据、下一步候选
Tested: PYTHON_DOTENV_DISABLED=1 pytest tests/test_s06_context_compact_baseline.py tests/test_deepagents_track_smoke.py tests/test_stage_track_capability_contract.py -q (24 passed)
Tested: git diff --check
Not-tested: Markdown rendering in generated web UI
The approved runtime-foundation plan called for a professional domain architecture over the existing LangChain product surface. This commit reconciles the six team lanes into one semantic history entry: typed settings, dependency-injector composition, runtime invocation context, TodoWrite/todo domain extraction, filesystem/tool-system policy seams, JSONL sessions, Typer/Rich/structlog local operations, and stage-3 verification gates.

Constraint: Scope is limited to coding-deepgent/ per the approved PRD and team staffing lanes.

Constraint: LangChain remains the runtime boundary; containers compose providers but domain modules do not import containers or hide business rules.

Constraint: Structured tool inputs continue to use Pydantic args_schema / LangChain-native schemas rather than ad-hoc dict parsing or alias fallback.

Rejected: Keep runtime-generated omx(team) checkpoint/merge commits | operational scaffolding violates the Lore commit protocol and obscures semantic review boundaries.

Rejected: Leave stage metadata at stage 1 | would skip the runtime-foundation contract tests after implementation.

Rejected: Accept mypy failures as an initial baseline | final integration typing issues were fixable without broadening scope.

Confidence: high

Scope-risk: moderate

Reversibility: clean via backup branch created before squash finalization.

Directive: Do not introduce new cc mirror modules or container imports in domain packages; update project_status.json, README, and runtime-foundation contract tests together when advancing later stages.

Tested: cd coding-deepgent && python -m pytest -q -> 59 passed

Tested: cd coding-deepgent && ruff check . -> all checks passed

Tested: cd coding-deepgent && ruff format --check . -> 68 files already formatted

Tested: cd coding-deepgent && python -m mypy src/coding_deepgent tests -> success

Tested: CLI smoke for python -m coding_deepgent --help, config show, sessions list, and doctor without credentials

Tested: Review grep checks for forbidden imports/modules/dependencies

Not-tested: Live model invocation; no credentials required for this foundation stage.
Stage 4 turns the runtime foundation into a safer and more extensible product base by adding deterministic permission decisions, local lifecycle hooks, and structured prompt/context assembly without replacing LangChain's create_agent loop. The implementation keeps cc-haha alignment at the behavior-contract level while preserving LangChain-first boundaries and tight regression coverage.

Constraint: Must stay LangChain/LangGraph-first and avoid introducing a custom query loop or speculative runtime wheel.

Constraint: Stage 4 ask semantics must remain deterministic and no-UI so verification stays headless and local.

Rejected: Copy cc-haha permission UI / HITL flow now | Stage 4 requires deterministic no-execution ask handling, not interactive approval.

Rejected: Build a custom tool executor around permissions/hooks | AgentMiddleware and strict tool schemas already express the needed control-plane seams.

Confidence: high

Scope-risk: moderate

Reversibility: clean

Directive: Keep future memory, compact, skills, subagents, and tasks layered on top of these control-plane seams; do not bypass PermissionManager or PromptContext with ad-hoc runtime logic.

Tested: cd coding-deepgent && python -m pytest -q -> 72 passed

Tested: cd coding-deepgent && ruff check . -> all checks passed

Tested: cd coding-deepgent && ruff format --check . -> 81 files already formatted

Tested: cd coding-deepgent && python -m mypy src/coding_deepgent tests -> success

Tested: grep guards for forbidden query loop / container imports / alias-fallback patterns -> passed

Tested: Architect verification -> APPROVE

Not-tested: Interactive approval UI/HITL; intentionally deferred beyond Stage 4.
Stage 5 absorbs the prompt/context/memory slice of the Claude Code roadmap through LangChain-native seams: a model-visible save_memory tool writes through ToolRuntime.store, recalled memories can be injected by middleware into the model request, and a deterministic tool-result budget helper bounds oversized payloads without adding message-history pruning or a custom query loop.

Constraint: LangChain/LangGraph store, ToolRuntime, middleware, and create_agent remain the integration surface.

Constraint: This stage is a foundation seam, not a durable cross-process memory guarantee beyond the configured store backend.

Rejected: Implement message-history projection/pruning now | would widen compact scope beyond the approved tool-result budget slice.

Rejected: Add LLM autocompact/session-memory side-agent behavior now | harder to verify deterministically and risks custom runtime drift.

Confidence: high

Scope-risk: moderate

Reversibility: clean

Directive: Keep memory separate from Todo and future durable tasks; future compact/subagent/task work must use the existing memory and prompt-context seams.

Tested: cd coding-deepgent && python -m pytest -q -> 80 passed

Tested: cd coding-deepgent && ruff check . -> all checks passed

Tested: cd coding-deepgent && ruff format --check . -> 93 files already formatted

Tested: cd coding-deepgent && python -m mypy src/coding_deepgent tests -> success

Tested: Stage 5 grep guards -> STAGE5_FINAL_GATES_OK

Tested: Architect verification -> APPROVE

Not-tested: Persistent cross-process memory backend; current stage intentionally ships the store-backed foundation seam.
Stage 6 adds the Option B-min slice from the cc product roadmap: local SKILL.md loading, a store-backed durable task graph, and a synchronous stateless run_subagent tool with an exact child-tool allowlist. The implementation keeps richer Claude Code agent runtime concepts deferred while wiring the new surfaces through the existing LangChain create_agent tool system.

Constraint: LangChain-first implementation; no custom query loop, background agents, mailbox, worktrees, remote/team runtime, sidechain resume, or forked skill execution.

Constraint: TodoWrite remains session-local and separate from durable task records.

Rejected: Full AgentTool parity now | would require background/resume/mailbox/worktree behavior outside the approved Option B-min slice.

Rejected: Plugin/MCP skill loading now | Stage 6 is local skills only and keeps extension platform work for a later stage.

Confidence: high

Scope-risk: moderate

Reversibility: clean

Directive: Future multi-agent work must explicitly expand the subagent contract before adding mailbox/background/worktree/resume semantics.

Tested: cd coding-deepgent && python -m pytest -q -> 87 passed

Tested: cd coding-deepgent && ruff check . -> all checks passed

Tested: cd coding-deepgent && ruff format --check . -> 107 files already formatted

Tested: cd coding-deepgent && python -m mypy src/coding_deepgent tests -> success

Tested: forbidden runtime-creep grep guards -> passed

Tested: Architect verification -> APPROVE

Not-tested: real background/remote subagent execution; intentionally deferred.
@vercel
Copy link
Copy Markdown

vercel bot commented Apr 14, 2026

Someone is attempting to deploy a commit to the crazyboym's projects Team on Vercel.

A member of the Team first needs to authorize it.

@CrazyBoyM CrazyBoyM force-pushed the main branch 2 times, most recently from 36897b1 to d882d01 Compare April 14, 2026 16:11
@kun1s2 kun1s2 changed the title [codex] Add context and compact foundations [codex] Close coding-deepgent MVP local agent harness core Apr 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants