From b8707e8732b61185dac5ff4b06bf548d91c9c251 Mon Sep 17 00:00:00 2001 From: Christopher Tso Date: Tue, 9 Jun 2026 11:42:15 +0200 Subject: [PATCH] chore(tracking): remove repo-local beads state --- .beads/.gitignore | 40 -------------------- .beads/README.md | 18 --------- .beads/config.yaml | 2 - .beads/issues.jsonl | 88 -------------------------------------------- .beads/metadata.json | 4 -- .gitignore | 12 +++--- CONTRIBUTING.md | 3 +- biome.json | 1 - 8 files changed, 9 insertions(+), 159 deletions(-) delete mode 100644 .beads/.gitignore delete mode 100644 .beads/README.md delete mode 100644 .beads/config.yaml delete mode 100644 .beads/issues.jsonl delete mode 100644 .beads/metadata.json diff --git a/.beads/.gitignore b/.beads/.gitignore deleted file mode 100644 index 3c1cd916..00000000 --- a/.beads/.gitignore +++ /dev/null @@ -1,40 +0,0 @@ -# SQLite databases -*.db -*.db?* -*.db-journal -*.db-wal -*.db-shm - -# Local history and recovery -.br_history/ -.br_recovery/ - -# Local version tracking -.local_version - -# Runtime files -*.lock -*.tmp -*.sock -daemon.lock -daemon.log -daemon.pid -last-touched -redirect -sync-state.json - -# Sync state and merge artifacts -.sync.lock -beads.base.jsonl -beads.base.meta.json -beads.left.jsonl -beads.left.meta.json -beads.right.jsonl -beads.right.meta.json -sync_base.jsonl - -# bv lock file -.bv.lock - -# NOTE: Do not add negation patterns here. -# JSONL files and config files are tracked by git by default because no pattern above ignores them. diff --git a/.beads/README.md b/.beads/README.md deleted file mode 100644 index e414b5fe..00000000 --- a/.beads/README.md +++ /dev/null @@ -1,18 +0,0 @@ -# Beads - -AgentV uses Beads for repo-local task tracking. - -Use `br` for all Beads operations in this repository: - -```bash -br ready --json -br list --json -br show --json -br update --claim --json -br close --reason "Completed" --json -br sync --flush-only -``` - -The durable task graph is tracked as JSONL in `.beads/issues.jsonl`. Local SQLite -databases, locks, history, and merge scratch files are ignored and should not be -committed. diff --git a/.beads/config.yaml b/.beads/config.yaml deleted file mode 100644 index b4250679..00000000 --- a/.beads/config.yaml +++ /dev/null @@ -1,2 +0,0 @@ -# Beads Project Configuration -issue_prefix: av diff --git a/.beads/issues.jsonl b/.beads/issues.jsonl deleted file mode 100644 index 655f385c..00000000 --- a/.beads/issues.jsonl +++ /dev/null @@ -1,88 +0,0 @@ -{"id":"av-1sr","title":"public demo: build dexter-evals companion project","description":"Plan: docs/plans/public-agentv-demo-projects.md#u3-build-dexter-evals-companion-project\nRequirements: R6, R7, R8, R9, R10, R16, R17, R18\n\nAcceptance:\n- Create dexter-evals AgentV config, eval YAML, scripts, .env.example, and README.\n- Pin/document Dexter version or commit and prerequisite install path.\n- Adapt Dexter public eval pattern into AgentV format rather than inventing a synthetic finance suite.\n- Setup fails clearly when Dexter/provider/data env is missing and does not print resolved secrets or private endpoints.\n- Produce one local AgentV result when env is configured.\n- Record AgentV schema/provider/rubric/result-flow friction as separate follow-up plan/Bead.","status":"closed","priority":1,"issue_type":"task","assignee":"codex-public-demo-plan","created_at":"2026-06-04T02:16:12.250114714Z","created_by":"codex-public-demo-plan","updated_at":"2026-06-04T04:16:41.991236878Z","closed_at":"2026-06-04T03:47:33.484197044Z","close_reason":"Completed source/project scope: dexter-evals companion project was implemented, validated with non-secret target-selection env, integrated into feature/agentv-public-demo, and downstream handoff notes were recorded. A real local AgentV result remains conditional on configured OPENAI_API_KEY, FINANCIAL_DATASETS_API_KEY, and search-provider env; result-sync/dashboard beads carry that credentialed-run caveat.","source_repo":"agentv","source_repo_path":"/home/entity/projects/EntityProcess/agentv","compaction_level":0,"original_size":0,"labels":["dexter-evals","public-demo"],"comments":[{"id":10,"issue_id":"av-1sr","author":"codex-public-demo-plan","text":"Created from doc review handoff. Requirements: docs/brainstorms/2026-06-04-public-agentv-demo-projects-requirements.md. Plan: docs/plans/public-agentv-demo-projects.md. Follow-up rule: Dashboard UX gaps and AgentV core gaps discovered during implementation should become separate focused Beads with evidence.","created_at":"2026-06-04T02:16:45Z"},{"id":15,"issue_id":"av-1sr","author":"codex-public-demo-plan","text":"Agent Mail broadcast attempted by IvoryDune on thread public-agentv-demo-projects. Delivery was blocked by contact policy for CoralGlen and QuietCove; pending contact requests were created by the Agent Mail server. Broadcast body summarized plan docs, claimed Beads, repo topology, Dashboard UX-gap follow-up rule, AgentV core-gap follow-up rule, secret handling, and result-sync artifact boundary.","created_at":"2026-06-04T02:19:02Z"},{"id":18,"issue_id":"av-1sr","author":"BlackMeadow","text":"bead-spawn-agent launched an agent for av-1sr.\n\nSession: agent-av-1sr-main-20260604045217\nDirectory: /home/entity/projects/EntityProcess/agentv.worktrees/public-demo-dexter-evals\nProfile: codex-eng (auto-detected if not specified)\n\nExported EP_TASK_ID, BEAD_ID, and AGENTV_BEAD_ID as av-1sr.\nWorktree: /home/entity/projects/EntityProcess/agentv.worktrees/public-demo-dexter-evals","created_at":"2026-06-04T02:52:17Z"},{"id":20,"issue_id":"av-1sr","author":"entity","text":"Orchestration update from BlackMeadow: per-task worktree may be used as scratch, but final Dexter companion changes must merge into shared integration worktree /home/entity/projects/EntityProcess/agentv.worktrees/public-demo-integration on branch feature/agentv-public-demo. Do not leave final work stranded on feature/av-1sr-main or open a standalone per-bead PR.","created_at":"2026-06-04T03:07:18Z"},{"id":22,"issue_id":"av-1sr","author":"entity","text":"Epic coordination update from BlackMeadow: all agentv-public-demo workers must use the same Beads source of truth. Run br mutations from /home/entity/projects/EntityProcess/agentv unless explicitly moved; treat per-task worktree .beads copies as read-only/stale. Code may still merge into /home/entity/projects/EntityProcess/agentv.worktrees/public-demo-integration.","created_at":"2026-06-04T03:08:15Z"},{"id":28,"issue_id":"av-1sr","author":"entity","text":"Implementation evidence: created dexter-evals companion project files and mirrored them into the public-demo integration checkout. Dexter source pinned to virattt/dexter commit 8d9419829f443f84b804d033bb2c3b1fbd788629. Project adapts Dexter finance_agent.csv rows into AgentV input/expected_output/rubrics, includes .agentv/targets.yaml, setup preflight, Dexter CLI wrapper, CSV-to-AgentV generator, .env.example, README, and public-safe .gitignore. Verification: AgentV build completed in scratch worktree after bun install; validation passed for dexter-evals eval + targets when non-secret dummy target-selection env was supplied. Missing-env setup was run in scrubbed env and failed with only variable names/prereq guidance, no resolved secret values or private endpoints. Generated eval script successfully converted 2 rows from a cloned Dexter source checkout at the pinned commit. Blocker: no OPENAI_API_KEY/FINANCIAL_DATASETS_API_KEY/search env is configured in this session, so producing a real local AgentV result is blocked on local credentials/data access. Follow-up beads opened: av-w9p for rubric operator semantics and av-njl for targets.yaml template validation.","created_at":"2026-06-04T03:17:04Z"},{"id":31,"issue_id":"av-1sr","author":"entity","text":"Final integration handoff: scratch commit 97219bcdabcc2a5394af3cbdeccdcba42d7953b8 was cherry-picked into /home/entity/projects/EntityProcess/agentv.worktrees/public-demo-integration on branch feature/agentv-public-demo as commit 3ae89357. Final verification after cherry-pick: AgentV validate passed for dexter-evals/evals/dexter-finance-smoke.eval.yaml and dexter-evals/.agentv/targets.yaml using non-secret dummy target-selection env; scrubbed setup preflight failed actionably for missing DEXTER_REPO_PATH, OPENAI_API_KEY, FINANCIAL_DATASETS_API_KEY, search key, and OPENAI_MODEL, and printed no resolved secret values/private endpoints. Integration checkout still has a pre-existing unstaged .gitignore change for .grepai/ that was not part of this bead.","created_at":"2026-06-04T03:19:04Z"},{"id":34,"issue_id":"av-1sr","author":"entity","text":"Migrated scratch-worktree note from /home/entity/projects/EntityProcess/agentv.worktrees/public-demo-dexter-evals: worker started U3 Dexter companion work with scope limited to public-safe AgentV config/eval/scripts/.env.example/README, Dexter pin/prereq docs, missing-env failure, Dexter-derived eval pattern, one local result if env permits, and separate follow-up Beads for AgentV friction. Downstream result-sync/dashboard beads only receive blocker/follow-up notes.","created_at":"2026-06-04T03:56:04Z"},{"id":35,"issue_id":"av-1sr","author":"BlackMeadow","text":"Scope superseded after user design correction: do not present this as a dexter-evals project. The durable demo project should be financial-research-agent, a coding/web research agent attempting to reproduce Dexter-style financial research against Dexter's public finance_agent.csv golden answers. Dexter remains a pinned upstream fixture/source attribution and optional compatibility target only; default demo path must not require FINANCIAL_DATASETS_API_KEY. Follow-up bead: av-fo9.","created_at":"2026-06-04T04:16:41Z"}]} -{"id":"av-2lq","title":"research(private): stand up Margin Eval in framework-parity repo","description":"Problem:\nWe have used Margin Eval as a design reference for filesystem-native benchmark packaging, immutable run bundles, resume, and agent/output trace capture, but current AgentV planning appears to rely on report-level analysis rather than a live private Margin setup. The user asked to add Margin Eval setup in EntityProcess/wtg-ai-prompts-experiment so implementation workers can understand how it works before finalizing AgentV bundle/schema details.\n\nScope:\n- Work only in the private EntityProcess/wtg-ai-prompts-experiment repo or an isolated scratch/worktree; do not add Margin artifacts to public AgentV docs/code.\n- Clone or otherwise inspect Margin-Lab/evals and at least one minimal suite/config path. If a local clone already exists elsewhere, record the path and commit instead of duplicating it.\n- Run the smallest feasible dry-run or no-secret smoke that demonstrates Margin output directory structure, resume metadata, run bundle files, logs/traces, agent config, suite config, and artifact naming.\n- Compare observed Margin output to AgentV v1 bundle direction: run_manifest.json, target_recipe.json, run_source.json, index.jsonl responsibilities, per-test folders, redaction, and copied-vs-referenced source material.\n- Record a concise private note under framework-parity/ and a Beads comment with the branch/commit and any concrete schema lessons.\n\nAcceptance:\n- Private note includes the Margin version/commit inspected, commands attempted, whether a dry-run/smoke succeeded, and the observed run output tree.\n- Note clearly says which Margin patterns AgentV should borrow and which should remain out of core.\n- Any discovered blocker is captured with enough detail for a follow-up worker.\n- No private repo URLs, secrets, raw env dumps, OAuth files, or vendored Margin source are added to AgentV public docs/code.\n\nNon-goals:\n- Do not implement AgentV run-bundle code in this task.\n- Do not turn AgentV into a Margin-compatible runner or clone Margin schemas wholesale.","acceptance_criteria":"In addition to the description acceptance:\n- Private implementation/setup in EntityProcess/wtg-ai-prompts-experiment reaches a usable Margin Eval smoke or records a concrete blocker with commands/logs.\n- Compare Margin Eval vs AgentV on authoring ceremony, task/case layout, target/agent config, environment isolation, source snapshots, output/run bundle layout, resume/rerun behavior, redaction, and dashboard/audit usability.\n- End with a clear product decision: modify AgentV code now, add/adjust AgentV examples/templates/docs, or defer to run-bundle schema work only.\n- If code changes are recommended, identify exact Beads/modules and why examples/templates are insufficient. If examples/templates are recommended, identify which examples/templates and why core should stay unchanged.\n- Do not make AgentV code changes inside this private Margin setup task; open/update follow-up Beads instead.","notes":"Completed with corrected design on 2026-06-08. Final private note commit: d8a8a870c14fcc9f1a47c9f2380389ddb97c5db4 on private/av-2lq-margin-eval-parity. Final recommendation supersedes comment #288: no separate AgentV code change from this research bead; av-wy0.3 owns implementation of self-contained per-test artifacts using eval.yaml, targets.yaml, copied files, and copied grader assets. No run_source.json, target_recipe.json, or run_manifest.json unless a concrete consumer later proves existing artifacts cannot serve.","status":"closed","priority":2,"issue_type":"task","assignee":"codex-av-2lq","created_at":"2026-06-08T13:58:01.413920918Z","created_by":"entity","updated_at":"2026-06-08T21:51:42.928515066Z","closed_at":"2026-06-08T21:33:04.579426195Z","close_reason":"Completed private Margin parity research and revised final recommendation after user design review. Private note pushed at d8a8a870c14fcc9f1a47c9f2380389ddb97c5db4. Durable AgentV design is now documented in av-wy0/av-wy0.2/av-wy0.3/av-wy0.4/av-wy0.5: self-contained per-test eval.yaml/targets.yaml/files/graders artifacts, no run_source.json/target_recipe.json/run_manifest.json schema unless later proven necessary.","source_repo":"agentv","source_repo_path":"/home/entity/projects/EntityProcess/agentv","compaction_level":0,"original_size":0,"labels":["framework-parity","margin","private","run-bundles"],"dependencies":[{"issue_id":"av-2lq","depends_on_id":"av-l52","type":"related","created_at":"2026-06-08T13:58:01.413920918Z","created_by":"entity","metadata":"{}","thread_id":""},{"issue_id":"av-2lq","depends_on_id":"av-wy0","type":"related","created_at":"2026-06-08T13:58:01.413920918Z","created_by":"entity","metadata":"{}","thread_id":""}],"comments":[{"id":287,"issue_id":"av-2lq","author":"entity","text":"Dispatch note (FuchsiaStream, 2026-06-08): spawned NTM Codex worker for Margin Eval private setup. Session: agentv--margin-eval. Pane/Agent Mail identity: SilentRobin. Scope: work in private repo /home/entity/projects/EntityProcess/wtg-ai-prompts-experiment on a dedicated private branch/worktree; clone/inspect Margin-Lab/evals outside the private repo; run smallest no-secret smoke/dry-run; write private framework-parity note; compare Margin vs AgentV; recommend code change vs examples/templates/docs vs defer; do not modify public AgentV code in this task. Worker should update av-2lq with branch/commit, Margin commit, commands, output tree, pros/cons, recommendation, Beads changes, and blockers.","created_at":"2026-06-08T20:29:45Z"},{"id":288,"issue_id":"av-2lq","author":"entity","text":"Handoff (codex-av-2lq, 2026-06-08): completed private Margin Eval framework-parity note; no public AgentV code changed.\n\nPath assumptions: AgentV Beads/status/comments used /home/entity/projects/EntityProcess/agentv explicitly. Private work used /home/entity/ntm_Dev/wtg-av-2lq-margin-parity, a worktree of /home/entity/projects/EntityProcess/wtg-ai-prompts-experiment. This handoff does not rely on /home/entity/ntm_Dev/agentv being the AgentV Beads checkout.\n\nPrivate branch/commit: EntityProcess/wtg-ai-prompts-experiment private/av-2lq-margin-eval-parity @ 5867096af01ee992d186a1b5b84bdb259955eda3. Note path: framework-parity/margin-eval-wtg-pr-run-parity.md. Branch was pushed to origin.\n\nMargin inspected: cloned https://github.com/Margin-Lab/evals.git at /home/entity/ntm_Dev/margin-evals-av-2lq, commit 53fb2fd080689efaf7934573d8759d14fc1043e4 (Add samples_per_case support for eval runs). Inspected runbundle, runfs, resume, localrunner, output_files, agent/eval TOML docs, and swe-minimal suite/case layout.\n\nReal WTG run evidence used instead of Margin dry-run per user preference: /home/entity/projects/WiseTechGlobal/WTG.AI.Prompts.EvalResults/.agentv/results/runs/default/pr679-pr50857-clean-2026-06-08T05-42-55Z. Results repo commit 597ef63632b0ba1239ff179087558e29ee694bb7. Source eval repo commit inspected: /home/entity/projects/WiseTechGlobal/WTG.AI.Prompts @ 87eb8ba456d47767729ceeb246e51f81865ef99d. Run was real Copilot target, 2 tests, aggregate pass_rate mean 0.75, duration 243.818s, index.jsonl 2 rows, transcript.jsonl 6 rows, size 728K.\n\nObserved WTG output tree summary: benchmark.json, index.jsonl, run-source.json, timing.json, transcript.jsonl, and per-test folders under data-transformation-pr50857-e2e// with input.md, grading.json, timing.json, outputs/response.md. Per-test scores: offline implementation review 0.6 (rubrics 0.5, skill-trigger 1.0); online chunking review 1.0 (rubrics 1.0, skill-trigger 1.0).\n\nMargin model summary: Margin local runs use results.json plus internal/bundle.json, internal/manifest.json, internal/progress.json, internal/events.jsonl, internal/artifacts.json, and instances// result/trajectory/log folders. Resume is driven by bundle hash, progress snapshot, instance keys, and carry-forward/rerun planning.\n\nCommands captured in note: git clone Margin, gh pr view private WTG PR #1, private worktree creation/merge, find output tree, jq index summary, attempted go test ./runner/runner-local/runfs ./runner/runner-core/resume ./runner/runner-local/localrunner.\n\nSmoke result/blocker: Margin Go unit smoke could not run because Go is not installed (zsh: command not found: go). No installed margin binary or ~/.margin config was present. Docker is available, so blocker is missing Go/prebuilt Margin CLI, not Docker. I did not run a Margin dry-run because user asked to prefer the real WTG PR run.\n\nRecommendation: defer AgentV code changes to av-wy0.3; do not start a separate AgentV code change from this Margin task. av-wy0.3 should implement run_manifest.json and target_recipe.json, hard-deprecate pre-release run-source.json to run_source.json before release, make run_manifest.json reference run_source.json, and keep target_recipe.json redacted with required env names/placeholders and config fingerprints only. Use examples/templates/docs for Margin-style case directories; av-l52 and av-wy0.5 already cover the likely template/docs work. Resume execution remains av-wy0.4. No new Bead needed.","created_at":"2026-06-08T20:51:40Z"},{"id":290,"issue_id":"av-2lq","author":"codex-av-2lq","text":"Design correction after user review (codex-av-2lq, 2026-06-08): revised the Margin parity recommendation. The durable AgentV design should not add run_source.json or target_recipe.json. Instead, per-test artifact folders should be self-contained and use native AgentV conventions: eval.yaml with exactly the test, targets.yaml with the selected target and placeholders, copied test files, copied grader assets, plus existing input/output/grading/timing artifacts. index.jsonl remains the run-level index pointing at those folders.\n\nI updated av-wy0.3 title/description/acceptance to reflect this goal directly and to supersede the older manifest/recipe comments. This is straightforward within the existing av-wy0 epic, not a new epic.","created_at":"2026-06-08T21:26:22Z"},{"id":295,"issue_id":"av-2lq","author":"entity","text":"Final corrected handoff (codex-av-2lq, 2026-06-08): private Margin parity note was revised and pushed at EntityProcess/wtg-ai-prompts-experiment private/av-2lq-margin-eval-parity @ d8a8a870c14fcc9f1a47c9f2380389ddb97c5db4, path framework-parity/margin-eval-wtg-pr-run-parity.md. This supersedes the stale recommendation in comment #288. Final recommendation: no AgentV code change from av-2lq; defer implementation to av-wy0.3 with self-contained per-test artifacts. Do not add run_source.json, target_recipe.json, or run_manifest.json. Use eval.yaml with one test, targets.yaml with selected target/placeholders, copied test files, copied grader assets, and existing input/output/grading/timing artifacts. Updated av-wy0, av-wy0.2, av-wy0.3, av-wy0.4, and av-wy0.5 accordingly. This is straightforward inside the existing av-wy0 epic, not a new epic.","created_at":"2026-06-08T21:32:48Z"},{"id":301,"issue_id":"av-2lq","author":"entity","text":"Post-close design correction (codex-av-2lq, 2026-06-08): final av-wy0 design now uses per-test inputs/ bundles, not task/ and not root-level eval.yaml/targets.yaml. The corrected recommendation is: no AgentV code change from av-2lq; av-wy0.3 should extract a reusable input-bundle materializer that writes inputs/EVAL.yaml, inputs/targets.yaml, inputs/files/, and inputs/graders/ beside input.md/grading.json/timing.json/outputs. Reruns should consume these by explicit path and write to a separate output run directory to avoid nested .agentv/results artifacts.","created_at":"2026-06-08T21:43:15Z"},{"id":307,"issue_id":"av-2lq","author":"entity","text":"Final naming correction (codex-av-2lq, 2026-06-08): final design uses task/ rather than inputs/. input.md is rendered agent input; task/ is the runnable task contract and contains task/EVAL.yaml, task/targets.yaml, task/files/, and task/graders/. This supersedes any intermediate inputs/ wording in comments or private-note drafts.","created_at":"2026-06-08T21:49:55Z"},{"id":308,"issue_id":"av-2lq","author":"entity","text":"Final private note revision (codex-av-2lq, 2026-06-08): pushed EntityProcess/wtg-ai-prompts-experiment private/av-2lq-margin-eval-parity @ 6bd57e70357624792c3ea58b90b06499cc4e7647. Note path: framework-parity/margin-eval-wtg-pr-run-parity.md. This version matches the final av-wy0 design: task/EVAL.yaml, task/targets.yaml, task/files/, task/graders/ beside input.md/grading/timing/outputs, extracted materializer independent of eval execution, no nested .agentv/results output, no run_source/target_recipe/run_manifest schema.","created_at":"2026-06-08T21:51:42Z"}]} -{"id":"av-33j","title":"cleanup: remove eval --benchmark-json","description":"Follow-up from av-eval-output-config-surface-4e2. Observable behavior today: agentv eval still accepts --benchmark-json , prints a deprecation warning, and writes a separate Agent Skills compatibility benchmark JSON even though benchmark.json is always written into the canonical run directory. Simpler model: remove the extra flag in a future breaking-change window and direct users to the run directory benchmark.json or a dedicated export/conversion wrapper if compatibility output remains needed. Migration notes: audit any Agent Skills compatibility consumers first; update docs/tests that mention --benchmark-json; keep canonical --output semantics unchanged.","status":"open","priority":3,"issue_type":"task","created_at":"2026-06-09T00:57:25.472739425Z","created_by":"entity","updated_at":"2026-06-09T00:57:25.472739425Z","source_repo":"av-output-config","source_repo_path":"/home/entity/projects/EntityProcess/agentv.worktrees/av-output-config","compaction_level":0,"original_size":0,"labels":["breaking-change","cleanup","cli"]} -{"id":"av-3j2","title":"public demo: wire projects into dashboard setup and capture UX gaps","description":"Plan: docs/plans/public-agentv-demo-projects.md#u5-wire-public-projects-into-local-and-deployment-demo-setup\nRequirements: R1, R2, R3, R4, R5, R19, R20, R21, R22, R23\n\nAcceptance:\n- Update public demo/deployment setup to register AgentV examples, dexter-evals, and swe-evals without private WiseTech projects.\n- Configure public result-repo mappings for dexter-evals and swe-evals.\n- Reuse existing clean clones and avoid destroying dirty clones.\n- Verify generated projects.yaml/result config, rebuild Dashboard frontend before UAT, and confirm remote-synced results appear.\n- Capture Dashboard UX gaps found from realistic data as follow-up Beads with evidence.\n- Capture AgentV core gaps found during conversion as focused follow-up plans/Beads unless they block the demo.","status":"closed","priority":1,"issue_type":"task","assignee":"codex-public-demo-plan","created_at":"2026-06-04T02:16:12.418786279Z","created_by":"codex-public-demo-plan","updated_at":"2026-06-05T12:46:53.501046180Z","closed_at":"2026-06-05T12:46:53.500844534Z","close_reason":"Completed via public demo deployment wiring on agentv-deploy feat/public-demo-results: setup registers agentv, financial-research-agent, and swe-evals with public result mappings; clean Dashboard setup verified remote-synced results. Evidence recorded through av-7m2 comment #68.","source_repo":"agentv","source_repo_path":"/home/entity/projects/EntityProcess/agentv","compaction_level":0,"original_size":0,"labels":["dashboard","deploy","public-demo"],"dependencies":[{"issue_id":"av-3j2","depends_on_id":"av-1sr","type":"blocks","created_at":"2026-06-04T02:16:12.981140557Z","created_by":"codex-public-demo-plan","metadata":"{}","thread_id":""},{"issue_id":"av-3j2","depends_on_id":"av-7m2","type":"blocks","created_at":"2026-06-04T02:16:13.067743868Z","created_by":"codex-public-demo-plan","metadata":"{}","thread_id":""},{"issue_id":"av-3j2","depends_on_id":"av-9fk","type":"blocks","created_at":"2026-06-04T02:16:12.863732542Z","created_by":"codex-public-demo-plan","metadata":"{}","thread_id":""},{"issue_id":"av-3j2","depends_on_id":"av-fo9","type":"blocks","created_at":"2026-06-04T04:16:43.904330712Z","created_by":"entity","metadata":"{}","thread_id":""}],"comments":[{"id":12,"issue_id":"av-3j2","author":"codex-public-demo-plan","text":"Created from doc review handoff. Requirements: docs/brainstorms/2026-06-04-public-agentv-demo-projects-requirements.md. Plan: docs/plans/public-agentv-demo-projects.md. Follow-up rule: Dashboard UX gaps and AgentV core gaps discovered during implementation should become separate focused Beads with evidence.","created_at":"2026-06-04T02:16:46Z"},{"id":17,"issue_id":"av-3j2","author":"codex-public-demo-plan","text":"Agent Mail broadcast attempted by IvoryDune on thread public-agentv-demo-projects. Delivery was blocked by contact policy for CoralGlen and QuietCove; pending contact requests were created by the Agent Mail server. Broadcast body summarized plan docs, claimed Beads, repo topology, Dashboard UX-gap follow-up rule, AgentV core-gap follow-up rule, secret handling, and result-sync artifact boundary.","created_at":"2026-06-04T02:19:02Z"},{"id":30,"issue_id":"av-3j2","author":"entity","text":"Dexter source-project handoff from av-1sr: dexter-evals is ready for project registration in the public-demo integration checkout. It validates with non-secret target-selection env and missing-env setup fails safely. Dashboard-visible real run data is pending a credentialed Dexter run because this session lacks provider/data/search env; do not assume dexter-evals-results artifacts exist yet.","created_at":"2026-06-04T03:17:38Z"},{"id":57,"issue_id":"av-3j2","author":"SilentCave","text":"bead-spawn-agent launched an agent for av-3j2.\n\nSession: agent-av-3j2-main-20260605120554\nDirectory: /home/entity/projects/EntityProcess/agentv\nProfile: codex-eng (auto-detected if not specified)\n\nExported EP_TASK_ID, BEAD_ID, and AGENTV_BEAD_ID as av-3j2.\nBeads coordination checkout: /home/entity/projects/EntityProcess/agentv","created_at":"2026-06-05T10:05:55Z"},{"id":59,"issue_id":"av-3j2","author":"entity","text":"Status review 2026-06-05: av-3j2 is in_progress/assigned to codex-public-demo-plan, but I found no implementation branch/worktree for U5 and no AgentV source edits to dashboard setup. The only git worktree registered for agentv is the main checkout; /home/entity/projects/EntityProcess/agentv.worktrees is empty. Evidence: U5 plan still requires public project registration + result mappings + Dashboard UAT; agentv-deploy main is clean but still wires private WiseTech projects in docker-entrypoint.sh, scripts/setup-local-agentv-dev.sh, scripts/run-local-agentv.sh, scripts/validate-config.sh, and README. Companion source repos are ready/clean: financial-research-agent main at abf4384 and swe-evals main at 5a47b59. Existing public result repo state is incomplete/ambiguous: agentv-examples-eval-results exists, financial-research-agent-eval-results exists locally, README/Beads now say financial-research-agent-evals, and no local swe-evals-results repo is present. Blockers/risks: av-7m2 result-sync contract remains in_progress; result repo name mismatch must be resolved before wiring; remote-synced artifacts for finance/SWE are not verified; Dashboard frontend rebuild/browser UAT and UX-gap capture have not happened. Recommended next action: finish av-7m2 first by choosing/creating the canonical finance + SWE public result repos and producing/pulling public-safe artifacts, then implement U5 in agentv-deploy by replacing the private WiseTech profile with agentv + financial-research-agent + swe-evals, update validation/docs, run --no-serve setup, inspect projects.yaml/result config, rebuild apps/dashboard/dist, and perform Dashboard UAT with follow-up Beads for UX gaps.","created_at":"2026-06-05T10:12:58Z"}]} -{"id":"av-3j8","title":"investigate Pi gpt-5.5 subscription reasoning effort control","description":"Goal: determine what reasoning/thinking level Pi uses when gpt-5.5 (subscription) is selected, and what AgentV/provider changes are needed so users can set it to medium. Acceptance: inspect existing Pi provider/target config support and any Pi CLI/API flags/env/config for reasoning effort; run safe local probes if available; document the observed default behavior for gpt-5.5 subscription; identify whether medium can be selected today; if missing, propose or implement the smallest AgentV change to expose medium reasoning for Pi without over-broad provider knobs; add focused tests/docs if code changes are made; record evidence and commands in Beads.","status":"closed","priority":1,"issue_type":"task","assignee":"entity","created_at":"2026-06-05T13:26:00.566552167Z","created_by":"entity","updated_at":"2026-06-05T13:51:20.964410603Z","closed_at":"2026-06-05T13:51:20.963272557Z","close_reason":"Completed investigation and pushed docs/tests on spike/av-3j8-pi-reasoning. Runtime evidence shows Pi gpt-5.5 supports medium and defaults to medium through the Pi SDK; AgentV can select it today via thinking: medium. Commit: 10dad6c8 docs(pi): document thinking level config.","source_repo":"agentv","source_repo_path":"/home/entity/projects/EntityProcess/agentv","compaction_level":0,"original_size":0,"labels":["codex","pi","providers","reasoning"],"comments":[{"id":71,"issue_id":"av-3j8","author":"entity","text":"bead-spawn-agent launched an agent for av-3j8.\n\nSession: agent-av-3j8-main-20260605152735\nDirectory: /home/entity/projects/EntityProcess/agentv.worktrees/spike-av-3j8-pi-reasoning\nProfile: codex-eng (auto-detected if not specified)\n\nExported EP_TASK_ID, BEAD_ID, and AGENTV_BEAD_ID as av-3j8.\nBeads coordination checkout: /home/entity/projects/EntityProcess/agentv\nWorktree: /home/entity/projects/EntityProcess/agentv.worktrees/spike-av-3j8-pi-reasoning","created_at":"2026-06-05T13:27:36Z"},{"id":74,"issue_id":"av-3j8","author":"entity","text":"Investigation evidence and outcome:\n- Worktree base verified with git fetch origin; HEAD and origin/main are both a5452d8c32314f8de256a5d27d91802b35f3e7df.\n- AgentV runtime already supports Pi thinking control: packages/core/src/evaluation/providers/targets.ts resolves target thinking/pi_thinking for both pi-coding-agent and pi-cli; pi-coding-agent passes it to createAgentSession as thinkingLevel; pi-cli emits --thinking .\n- Local Pi CLI probe: pi --help on pi 0.78.1 lists --thinking with off, minimal, low, medium, high, xhigh, and supports model shorthand like --model sonnet:high.\n- Local Pi SDK/package probe: @earendil-works/pi-coding-agent DEFAULT_THINKING_LEVEL is medium. For @earendil-works/pi-ai gpt-5.5, getSupportedThinkingLevels returns off, low, medium, high, xhigh; clampThinkingLevel(gpt-5.5, medium) returns medium.\n- Answer: when AgentV selects pi-coding-agent subprovider openai-codex/model gpt-5.5 and does not set thinking, Pi SDK default is medium. Medium can be selected today with thinking: medium (or pi_thinking: medium) for pi-coding-agent, and with thinking: medium for pi-cli which becomes --thinking medium.\n- Smallest useful AgentV change implemented: docs now expose existing Pi target fields and gpt-5.5 subscription example; focused tests now lock medium target resolution for pi-coding-agent and pi-cli.\n- Verification: initial focused test run failed before targets.test.ts due missing fast-glob in incomplete node_modules; ran bun install; reran bun test packages/core/test/evaluation/providers/targets.test.ts packages/core/test/evaluation/providers/pi-coding-agent.test.ts packages/core/test/evaluation/providers/pi-cli-tool-extraction.test.ts -> 71 pass, 0 fail.\n","created_at":"2026-06-05T13:48:04Z"}]} -{"id":"av-3yr","title":"public demo: browser UAT for public Dashboard setup","description":"Follow-up after av-7m2/av-3j2. Current evidence verifies clean Dashboard setup through APIs and remote-sync endpoints, but not full browser UAT. Acceptance: rebuild Dashboard frontend, launch clean public demo setup with AGENTV_HOME isolated from private projects, use agent-browser to verify the projects page shows only public projects, remote-synced finance/SWE runs appear, run detail pages open, and any UX/core gaps found with realistic public data are captured as separate Beads with screenshots/evidence.","status":"closed","priority":1,"issue_type":"task","assignee":"entity","created_at":"2026-06-05T12:50:04.513195108Z","created_by":"entity","updated_at":"2026-06-06T04:10:34.509360995Z","closed_at":"2026-06-06T03:40:35.416546030Z","close_reason":"Completed browser UAT for public Dashboard setup. Remote result sync works for finance and SWE public result repos; detail materialization works. Screenshots saved to agentv-assets-private dogfood/av-3yr-public-dashboard-uat. Follow-up bugs opened: av-fgt for stale public setup config shape and av-jk9 for remote run list count/source affordance issues.","source_repo":"agentv","source_repo_path":"/home/entity/projects/EntityProcess/agentv","compaction_level":0,"original_size":0,"labels":["dashboard","public-demo","uat"],"comments":[{"id":78,"issue_id":"av-3yr","author":"entity","text":"Public Dashboard UAT completed 2026-06-06 with isolated config home `/tmp/agentv-public-uat-home` and Dashboard on localhost:3219. Preflight: rebuilt `apps/dashboard/dist` with `cd apps/dashboard && bun run build`; source setup synced public repos; current AgentV required manual config rewrite to `projects[].results` because agentv-deploy still emits stale `projects.yaml`/`results_by_project` shape (follow-up `av-fgt`).\n\nRemote result sync verification:\n- `/api/projects` listed exactly 3 projects: agentv, financial-research-agent, swe-evals.\n- `POST /api/projects/financial-research-agent/remote/sync` returned configured/available true for `christso/financial-research-agent-evals`, path `/home/entity/projects/EntityProcess/financial-research-agent-evals`, run_count=2.\n- `POST /api/projects/swe-evals/remote/sync` returned configured/available true for `EntityProcess/swe-evals-results`, path `/home/entity/projects/EntityProcess/swe-evals-results`, run_count=2.\n- Remote detail materialization worked: finance remote live run returned 1 result; SWE remote live run returned 3 results.\n\nCanonical result repo commits verified:\n- `christso/financial-research-agent-evals@954e1fd` with `.agentv/results/runs/av-h60-live-codex-azure/2026-06-05T14-15-35-082Z`.\n- `EntityProcess/swe-evals-results@72ffa07` with `.agentv/results/runs/av-h60-live-codex-azure/2026-06-05T14-18-58-279Z`.\n\nBrowser UX evidence saved under agentv-assets-private `dogfood/av-3yr-public-dashboard-uat/` screenshots 01-09. UI flows verified: projects page, finance all/remote/detail, SWE all/remote/detail, and Sync Remote Results button. UX/product gaps captured as `av-fgt` and `av-jk9`.","created_at":"2026-06-06T03:40:35Z"},{"id":80,"issue_id":"av-3yr","author":"entity","text":"Screenshot evidence pushed in agentv-assets-private commit 67dc6fb (dogfood/av-3yr-public-dashboard-uat/01-09).","created_at":"2026-06-06T03:42:12Z"},{"id":83,"issue_id":"av-3yr","author":"entity","text":"Post-UAT repo ownership update 2026-06-06: finance source/results are now public EntityProcess sibling repos: `EntityProcess/financial-research-agent@90863fe` and `EntityProcess/financial-research-agent-evals@245cd12`. agentv-deploy public demo config pushed at `3a7eb38` with EntityProcess owner references.","created_at":"2026-06-06T04:10:34Z"}]} -{"id":"av-4yd","title":"fix(dashboard): use project display names in scoped dashboard chrome","description":"Dogfood evidence from WTG.AI.Prompts remote sync on 2026-06-06.\\n\\nObservable behavior:\\n- Project chooser card correctly shows name `WTG.AI.Prompts`.\\n- Opening `/projects/wtg-ai-prompts` changes the main heading and sidebar project label to the ID `wtg-ai-prompts`.\\n- Remote run detail breadcrumb also uses `wtg-ai-prompts`, while the user-facing repo/project name is `WTG.AI.Prompts`.\\n\\nWhy it matters:\\nUsers evaluating multiple configured repos need stable human project identity, especially for dotted/private repo names. Seeing the slug makes the Dashboard feel like an internal routing surface and makes it harder to confirm they are syncing the intended repo.\\n\\nAcceptance:\\n- Project-scoped routes render the registry `name` as the primary project title and breadcrumb/sidebar label, falling back to ID only if name is unavailable.\\n- URLs continue to use the ID; no routing change.\\n- Tests or component coverage prove a project with id `wtg-ai-prompts` and name `WTG.AI.Prompts` renders the name in project chrome.\\n- Verify with WTG.AI.Prompts screenshot after fix.","status":"closed","priority":2,"issue_type":"bug","assignee":"entity","created_at":"2026-06-06T05:15:11.085735498Z","created_by":"entity","updated_at":"2026-06-06T22:28:10.154276120Z","closed_at":"2026-06-06T22:28:10.154156678Z","close_reason":"Merged via PR #1310 (fix(dashboard): use registry project display names).","source_repo":"agentv","source_repo_path":"/home/entity/projects/EntityProcess/agentv","compaction_level":0,"original_size":0,"labels":["dashboard","projects","remote-sync","ux"],"comments":[{"id":141,"issue_id":"av-4yd","author":"entity","text":"Launching NTM-managed Codex worker after tmux recovery cleanup. Session: agentv-av-4yd-display-names. Implementation checkout: /home/entity/ntm_Dev/agentv-av-4yd-display-names on branch feature/av-4yd-project-display-names. Coordination checkout for br only: /home/entity/projects/EntityProcess/agentv. Repo focus: EntityProcess/agentv. Dashboard project display names. Monitor with: ntm status agentv-av-4yd-display-names; ntm view agentv-av-4yd-display-names.","created_at":"2026-06-06T14:28:10Z"},{"id":153,"issue_id":"av-4yd","author":"entity","text":"MagentaBasin started implementation in /home/entity/ntm_Dev/agentv-av-4yd-display-names. Read AGENTS.md and bead context; branch feature/av-4yd-project-display-names is based on current origin/main. Agent Mail reservations requested; conflicts noted with HazyMill on apps/dashboard/src/routes/projects/.tsx and CalmBeacon on dashboard test globs, and coordination message sent on thread av-4yd-display-name-reservations.","created_at":"2026-06-06T14:43:54Z"},{"id":162,"issue_id":"av-4yd","author":"entity","text":"Verification update from MagentaBasin. Implementation centralizes project display-name resolution in Dashboard and adds component/regression coverage for id wtg-ai-prompts + name WTG.AI.Prompts. Correcting earlier note: the reservation conflict path was apps/dashboard/src/routes/projects/$projectId.tsx. Verification: bun --filter @agentv/dashboard test passed (49 tests); bun --filter @agentv/dashboard build passed; bun run test passed across core/eval/phoenix/cli/dashboard; tracked-file Biome passed via git ls-files -z | xargs -0 bunx biome check. Full bun run lint is blocked in this local worktree by ignored NTM runtime state .ntm/rate_limits.json being rewritten without a final newline during scan, not by tracked files. Manual UAT: using temp AGENTV_HOME configs with registry id wtg-ai-prompts and name WTG.AI.Prompts, origin/main red server on port 42117 already rendered WTG.AI.Prompts in sidebar/breadcrumb/title and kept URL /projects/wtg-ai-prompts, so the slug leak did not reproduce on current main. Green server on branch port 42118 rendered the same accepted state. Screenshots: /tmp/agentv-av-4yd-screenshots/red-origin-main-wtg-project.png and /tmp/agentv-av-4yd-screenshots/green-feature-wtg-project.png.","created_at":"2026-06-06T15:33:19Z"},{"id":165,"issue_id":"av-4yd","author":"entity","text":"Final handoff from MagentaBasin. Code commit 4332c53d (fix(dashboard): use registry project display names) pushed to origin/feature/av-4yd-project-display-names. Push used --no-verify because the local pre-push hook runs bun run lint, and Biome scans ignored NTM runtime state .ntm/rate_limits.json which is rewritten without a final newline while hooks run. The hook typecheck steps passed before lint failed. Manual verification already completed: bun --filter @agentv/dashboard test passed; bun --filter @agentv/dashboard build passed; bun run test passed; tracked-file Biome check passed. UAT screenshots: /tmp/agentv-av-4yd-screenshots/red-origin-main-wtg-project.png and /tmp/agentv-av-4yd-screenshots/green-feature-wtg-project.png. Bead remains in_progress pending PR/review/merge.","created_at":"2026-06-06T15:39:57Z"}]} -{"id":"av-743","title":"dashboard: fix mobile table layout","description":"GitHub issue: https://github.com/EntityProcess/agentv/issues/1326\n\nProblem:\nDashboard tables, especially the project RunList table shown in the issue screenshot, lose or hide columns on mobile/narrow viewports. The current RunList table uses horizontal overflow and a min-width, but the mobile screenshot still shows only part of the table and does not present all run information clearly.\n\nAcceptance:\n- On phone-width viewports, the project runs table does not lose information from right-side columns.\n- Prefer a mobile-specific card/list representation for RunList, or otherwise provide a clear horizontal-scroll layout/affordance that is usable on touch devices.\n- Preserve the existing dense dark Dashboard visual language from apps/dashboard/DESIGN.md.\n- Keep desktop/tablet table behavior intact.\n- Apply the same reusable pattern to any directly shared RunList usage rather than only one route.\n- Verify with a mobile browser screenshot for /projects/ or equivalent Dashboard route, saved under /home/entity/projects/EntityProcess/agentv-assets-private/dogfood// per AGENTS.md.local when browser verification is possible.\n- Run relevant Dashboard checks and record exact verification in Beads.","status":"closed","priority":1,"issue_type":"bug","assignee":"agentv-av-743-mobile-tables","created_at":"2026-06-08T02:03:59.591452812Z","created_by":"entity","updated_at":"2026-06-08T03:52:03.424380660Z","closed_at":"2026-06-08T02:44:11.939732847Z","close_reason":"Implemented mobile-safe RunList card layout, preserved desktop table, added focused coverage, and captured mobile/desktop browser evidence.","external_ref":"https://github.com/EntityProcess/agentv/issues/1326","source_repo":"agentv","source_repo_path":"/home/entity/projects/EntityProcess/agentv","compaction_level":0,"original_size":0,"labels":["dashboard","github-issue","mobile","ux"],"comments":[{"id":256,"issue_id":"av-743","author":"agentv-av-743-mobile-tables","text":"Implemented av-743 mobile RunList fix.\n\nChanges:\n- RunList now renders phone-width rows as dense cards (`sm:hidden`) with all former right-side table data visible: source, quality passed, quality failures, errors, quality total, pass rate, and date.\n- Tablet/desktop RunList keeps the existing horizontally scrollable table (`sm:block`) and shared row calculations via `buildRunListItemView`.\n- Infinite-load sentinel now observes both mobile card and desktop table sentinels.\n- Project and single-project tab rows wrap below `sm` so the mobile evidence does not show clipped tab text.\n- Added `apps/dashboard/src/components/RunList.mobile.spec.tsx` without touching the reserved Dashboard `*.test.tsx` files.\n\nVerification:\n- `bun test src/components/RunList.mobile.spec.tsx` from `apps/dashboard`: 2 pass, 0 fail.\n- `bunx biome check apps/dashboard/src/components/RunList.tsx apps/dashboard/src/components/RunList.mobile.spec.tsx apps/dashboard/src/routes/projects/$projectId.tsx apps/dashboard/src/routes/index.tsx`: pass.\n- `bun --filter @agentv/dashboard test`: 65 pass, 0 fail.\n- `bun --filter @agentv/dashboard build`: pass; Vite emitted the existing >500 kB chunk warning.\n- Browser evidence served from this worktree on `http://localhost:3127/projects/agentv?tab=runs`:\n - Mobile 390x1200: `/home/entity/projects/EntityProcess/agentv-assets-private/dogfood/av-743/runlist-mobile-agentv.png`\n - Desktop 1800x1000: `/home/entity/projects/EntityProcess/agentv-assets-private/dogfood/av-743/runlist-desktop-agentv.png`\n\nVisual result: phone-width RunList cards show complete row data without horizontal table hunting, overlap, or clipped right-side columns; desktop table remains intact.","created_at":"2026-06-08T02:43:53Z"},{"id":257,"issue_id":"av-743","author":"agent-orchestrator","text":"Dogfood evidence moved to the renamed private evidence repo and pushed. Repo: EntityProcess/agentv-private. Commit: 38087ddabf737a004f6ffbeede239ef38666fb61. Paths: dogfood/av-743/runlist-mobile-agentv.png and dogfood/av-743/runlist-desktop-agentv.png.","created_at":"2026-06-08T03:52:03Z"}]} -{"id":"av-7m2","title":"public demo: create public results repos and sync contract","description":"Plan: docs/plans/public-agentv-demo-projects.md#u4-create-public-results-repositories-and-result-sync-config\nRequirements: R5, R22\n\nAcceptance:\n- Create or specify dexter-evals-results and swe-evals-results public repos.\n- Choose one authoritative v1 result-sync config location.\n- Document result repo URL, branch, artifact root, local checkout path, writer auth source, reader mode, push/export and pull/sync commands, conflict handling, and Dashboard ingestion path.\n- Verify local artifacts can be published as public-safe Dashboard-ready artifacts and pulled by a clean Dashboard setup.\n- Use least-privilege result credentials that are not inherited by eval subprocesses.\n- Run a lightweight artifact allowlist/leakage preflight before public push.","status":"closed","priority":1,"issue_type":"task","assignee":"codex-public-demo-plan","created_at":"2026-06-04T02:16:12.330583185Z","created_by":"codex-public-demo-plan","updated_at":"2026-06-06T04:10:34.142613484Z","closed_at":"2026-06-05T12:46:28.978575111Z","close_reason":"Completed and pushed: public result repos created, canonical .agentv/results/runs artifact root wired, public demo result sync documented, dry-run artifacts published with preflight, and clean Dashboard remote sync verified. Evidence in comments #66 and #68.","source_repo":"agentv","source_repo_path":"/home/entity/projects/EntityProcess/agentv","compaction_level":0,"original_size":0,"labels":["dashboard","public-demo","result-sync"],"dependencies":[{"issue_id":"av-7m2","depends_on_id":"av-1sr","type":"blocks","created_at":"2026-06-04T02:16:12.733867521Z","created_by":"codex-public-demo-plan","metadata":"{}","thread_id":""},{"issue_id":"av-7m2","depends_on_id":"av-9fk","type":"blocks","created_at":"2026-06-04T02:16:12.612111236Z","created_by":"codex-public-demo-plan","metadata":"{}","thread_id":""},{"issue_id":"av-7m2","depends_on_id":"av-fo9","type":"blocks","created_at":"2026-06-04T04:16:43.243034738Z","created_by":"entity","metadata":"{}","thread_id":""}],"comments":[{"id":11,"issue_id":"av-7m2","author":"codex-public-demo-plan","text":"Created from doc review handoff. Requirements: docs/brainstorms/2026-06-04-public-agentv-demo-projects-requirements.md. Plan: docs/plans/public-agentv-demo-projects.md. Follow-up rule: Dashboard UX gaps and AgentV core gaps discovered during implementation should become separate focused Beads with evidence.","created_at":"2026-06-04T02:16:45Z"},{"id":16,"issue_id":"av-7m2","author":"codex-public-demo-plan","text":"Agent Mail broadcast attempted by IvoryDune on thread public-agentv-demo-projects. Delivery was blocked by contact policy for CoralGlen and QuietCove; pending contact requests were created by the Agent Mail server. Broadcast body summarized plan docs, claimed Beads, repo topology, Dashboard UX-gap follow-up rule, AgentV core-gap follow-up rule, secret handling, and result-sync artifact boundary.","created_at":"2026-06-04T02:19:02Z"},{"id":29,"issue_id":"av-7m2","author":"entity","text":"Dexter source-project handoff from av-1sr: dexter-evals files are mirrored into the public-demo integration checkout with pinned Dexter commit 8d9419829f443f84b804d033bb2c3b1fbd788629, AgentV smoke eval, targets template, setup preflight, wrapper, generator, .env.example, and README. Blocker for result-sync artifacts: this session has no OPENAI_API_KEY/FINANCIAL_DATASETS_API_KEY/search env, so no real Dexter AgentV result JSONL was produced. Result-sync should wait for a credentialed local run or use a separately supplied public-safe artifact.","created_at":"2026-06-04T03:17:37Z"},{"id":36,"issue_id":"av-7m2","author":"BlackMeadow","text":"Result-sync design correction: replace dexter-evals-results with financial-research-agent-evals. The project/repo to publish is financial-research-agent; Dexter is only the benchmark fixture/golden-answer source. Keep swe-evals-results for SWE. New blocking finance bead: av-fo9.","created_at":"2026-06-04T04:16:42Z"},{"id":58,"issue_id":"av-7m2","author":"SilentCave","text":"bead-spawn-agent launched an agent for av-7m2.\n\nSession: agent-av-7m2-main-20260605120554\nDirectory: /home/entity/projects/EntityProcess/agentv\nProfile: codex-eng (auto-detected if not specified)\n\nExported EP_TASK_ID, BEAD_ID, and AGENTV_BEAD_ID as av-7m2.\nBeads coordination checkout: /home/entity/projects/EntityProcess/agentv","created_at":"2026-06-05T10:05:55Z"},{"id":61,"issue_id":"av-7m2","author":"entity","text":"Status review (2026-06-05): av-7m2 remains in_progress and should not be treated as complete. Evidence found: source companion repos are durable and clean/pushed: financial-research-agent at christso/financial-research-agent main abf4384 and swe-evals at EntityProcess/swe-evals main 5a47b59. The plan requires U4 to create final result repos, document the v1 result-sync contract, publish public-safe artifacts after allowlist/leakage preflight, and verify clean Dashboard pull/display. Local/remote evidence shows only christso/financial-research-agent-eval-results exists at d7ad6b with README + runs/.gitkeep; the corrected finance contract in av-7m2/av-fo9 says financial-research-agent-evals, and git ls-remote for christso/financial-research-agent-evals returned not found. No swe-evals-results repo was found locally; git ls-remote for EntityProcess/swe-evals-results and christso/swe-evals-results returned not found. agentv-deploy still references private WiseTech result repos, not the public demo repos. Blockers/risks: result repo naming mismatch for finance, missing SWE result repo, no published Dashboard-ready artifacts, no documented auth/reader/writer/conflict/ingestion contract, no allowlist/leakage preflight evidence, and no clean Dashboard remote-sync verification. Recommended next action: decide/fix final result repo names first (suggest aligning finance to financial-research-agent-evals unless intentionally keeping the existing singular repo), create/push the SWE results repo, add the authoritative sync contract/config in the chosen public setup surface, then run one minimal public-safe artifact publish + clean Dashboard pull/display verification before closing this bead.","created_at":"2026-06-05T10:13:29Z"},{"id":66,"issue_id":"av-7m2","author":"entity","text":"Implementation decision (2026-06-05): canonical public result repos are christso/financial-research-agent-evals and EntityProcess/swe-evals-results. The pre-existing christso/financial-research-agent-eval-results repo is private and uses the old singular name, so it is not part of the public contract. Secret inspection via bws was metadata-only: bws is available and BWS_ACCESS_TOKEN is set, but no Azure/OpenAI grader secret was discoverable by key/fields; local shell also has no AZURE_OPENAI_* or OPENAI_* env set. I will wire local .env target selection for AGENT_TARGET=codex and GRADER_TARGET=azure without committing secrets, use result GitHub credentials only in result-sync git operations, and publish dry-run/public-safe artifacts unless real Azure credentials become available.","created_at":"2026-06-05T12:02:07Z"},{"id":68,"issue_id":"av-7m2","author":"entity","text":"Implementation evidence (2026-06-05): created canonical public result repos and wired public Dashboard setup. Repos: christso/financial-research-agent-evals public main 6a6ef877ed21859a265a733dc5ef1428095cc066; EntityProcess/swe-evals-results public main abd2ef6ae953b7ef01ac863e0dc676de040ac990. Deploy wiring lives on EntityProcess/agentv-deploy branch feat/public-demo-results at 0ff7adb14612b07ee8f447fc2e0081a06130579d. That branch registers agentv, financial-research-agent, and swe-evals; maps results_by_project/project-local results to christso/financial-research-agent-evals and EntityProcess/swe-evals-results; documents URL/branch/artifact root/local path/writer auth/reader mode/push-pull/conflict/Dashboard ingestion; and adds scripts/check-public-result-artifacts.py. Canonical artifact root corrected to .agentv/results/runs after Dashboard remote listing showed top-level runs/ was not ingested. Secret handling: bws metadata inspection found no Azure/OpenAI grader secret; local .env files were created only in ignored source checkouts with AGENT_TARGET=codex, GRADER_TARGET=azure, and empty Azure slots, no secret values. Result writer credential path is separated as RESULT_SYNC_GITHUB_TOKEN (fallback GITHUB_TOKEN/gh auth for local helper); actual push used existing gh auth, not a newly minted fine-grained token. Artifact publication: finance dry-run artifact from AGENT_TARGET=codex/GRADER_TARGET=azure dry-run published at .agentv/results/runs/default/2026-06-05T12-10-36-119Z-dry-run and passed public artifact preflight after local path scrubbing. SWE dry-run started with codex-dry-run targets; two rows completed and one setup row captured npm ECONNRESET, published at .agentv/results/runs/default/2026-06-05T12-08-27-224Z-dry-run-with-network-error and passed the same preflight. Validation: agentv-deploy ./scripts/validate-config.sh passed static checks and docker compose config. Clean Dashboard verification with AGENTV_HOME=/tmp/agentv-public-demo-home on port 3197: /api/projects registered all 3 projects; POST /api/projects/financial-research-agent/remote/sync returned run_count=1; POST /api/projects/swe-evals/remote/sync returned run_count=1; /api/projects/*/runs listed remote:: runs from the public result repos; remote run detail endpoints materialized both artifacts. Remaining risks: no real Azure grader/live provider artifact was produced because no Azure secret exists in BWS or shell env; SWE artifact includes one network setup error and should be replaced by a clean live/dry-run artifact when npm install succeeds or dependencies are cached; least-privilege token is wired/configured but not proven with a fine-grained token.","created_at":"2026-06-05T12:21:11Z"},{"id":82,"issue_id":"av-7m2","author":"entity","text":"Repo ownership update 2026-06-06: moved canonical finance result repo from `christso/financial-research-agent-evals` to public sibling repo `EntityProcess/financial-research-agent-evals`. Local origin updated; main is `245cd12` after README owner update and still contains live finance results under `.agentv/results/runs/av-h60-live-codex-azure/2026-06-05T14-15-35-082Z`. Updated and pushed agentv-deploy main `3a7eb38` so public demo setup points at EntityProcess finance source/results repos.","created_at":"2026-06-06T04:10:34Z"}]} -{"id":"av-83h","title":"public demo: research and freeze swe-evals task pack","description":"Plan: docs/plans/public-agentv-demo-projects.md#u1-research-and-freeze-the-swe-evals-task-pack\nRequirements: R11, R12, R15\n\nAcceptance:\n- Select a small public SWE-style task pack from researched sources including SWE-bench/Multi-SWE-bench/Marginlab-style drift tracking.\n- Record source, repo URL, previous commit, issue/problem statement, verification command or grader signal, and selection rationale for each task.\n- Validate at least one selected repo checkout and test command before harness work proceeds.\n- Bound the candidate survey and record at least one rejected candidate with reason.\n- If task conversion exposes an AgentV primitive/schema gap, draft a focused follow-up plan and Bead instead of expanding this task.","status":"closed","priority":1,"issue_type":"task","assignee":"codex-public-demo-plan","created_at":"2026-06-04T02:16:12.012343585Z","created_by":"codex-public-demo-plan","updated_at":"2026-06-04T03:15:41.332739133Z","closed_at":"2026-06-04T03:14:45.671468739Z","close_reason":"Completed U1: froze metadata-only Day.js Multi-SWE-bench task pack, validated one checkout/test command red/green, recorded rejected candidates, and handed off to harness bead.","source_repo":"agentv","source_repo_path":"/home/entity/projects/EntityProcess/agentv","compaction_level":0,"original_size":0,"labels":["public-demo","research","swe-evals"],"comments":[{"id":7,"issue_id":"av-83h","author":"codex-public-demo-plan","text":"Created from doc review handoff. Requirements: docs/brainstorms/2026-06-04-public-agentv-demo-projects-requirements.md. Plan: docs/plans/public-agentv-demo-projects.md. Follow-up rule: Dashboard UX gaps and AgentV core gaps discovered during implementation should become separate focused Beads with evidence.","created_at":"2026-06-04T02:16:13Z"},{"id":8,"issue_id":"av-83h","author":"codex-public-demo-plan","text":"Created from doc review handoff. Requirements: docs/brainstorms/2026-06-04-public-agentv-demo-projects-requirements.md. Plan: docs/plans/public-agentv-demo-projects.md. Follow-up rule: Dashboard UX gaps and AgentV core gaps discovered during implementation should become separate focused Beads with evidence.","created_at":"2026-06-04T02:16:45Z"},{"id":13,"issue_id":"av-83h","author":"codex-public-demo-plan","text":"Agent Mail broadcast attempted by IvoryDune on thread public-agentv-demo-projects. Delivery was blocked by contact policy for CoralGlen and QuietCove; pending contact requests were created by the Agent Mail server. Broadcast body summarized plan docs, claimed Beads, repo topology, Dashboard UX-gap follow-up rule, AgentV core-gap follow-up rule, secret handling, and result-sync artifact boundary.","created_at":"2026-06-04T02:19:02Z"},{"id":19,"issue_id":"av-83h","author":"BlackMeadow","text":"bead-spawn-agent launched an agent for av-83h.\n\nSession: agent-av-83h-main-20260604045217\nDirectory: /home/entity/projects/EntityProcess/agentv.worktrees/public-demo-swe-task-pack\nProfile: codex-eng (auto-detected if not specified)\n\nExported EP_TASK_ID, BEAD_ID, and AGENTV_BEAD_ID as av-83h.\nWorktree: /home/entity/projects/EntityProcess/agentv.worktrees/public-demo-swe-task-pack","created_at":"2026-06-04T02:52:17Z"},{"id":21,"issue_id":"av-83h","author":"entity","text":"Orchestration update from BlackMeadow: per-task worktree may be used as scratch, but final SWE task-pack changes must merge into shared integration worktree /home/entity/projects/EntityProcess/agentv.worktrees/public-demo-integration on branch feature/agentv-public-demo. Do not leave final work stranded on feature/av-83h-main or open a standalone per-bead PR.","created_at":"2026-06-04T03:07:18Z"},{"id":23,"issue_id":"av-83h","author":"entity","text":"Epic coordination update from BlackMeadow: all agentv-public-demo workers must use the same Beads source of truth. Run br mutations from /home/entity/projects/EntityProcess/agentv unless explicitly moved; treat per-task worktree .beads copies as read-only/stale. Code may still merge into /home/entity/projects/EntityProcess/agentv.worktrees/public-demo-integration.","created_at":"2026-06-04T03:08:15Z"},{"id":24,"issue_id":"av-83h","author":"entity","text":"Decision: froze v1 SWE task pack as a metadata-only Day.js pack from Multi-SWE-bench. Selected tasks: iamkun__dayjs-1470 (invalidDate locale override), iamkun__dayjs-2231 (YYYY leading zero padding), iamkun__dayjs-2175 (objectSupport null invalid date). Source files and rationale are in swe-evals/tasks/dayjs-v1.yaml and swe-evals/tasks/README.md on integration branch feature/agentv-public-demo. Candidate survey was bounded to SWE-bench/SWE-bench Multilingual as schema references, Multi-SWE-bench as selected source, and Marginlab-style repeated-pack methodology; rejected repos include express, axios, darkreader, svelte, vue, and mui with reasons.","created_at":"2026-06-04T03:14:32Z"},{"id":25,"issue_id":"av-83h","author":"entity","text":"Verification evidence: validated iamkun__dayjs-1470 in /tmp/agentv-swe-task-validation-dayjs-1470. Checked out 0fdac93ff2531542301b76952be9b084b2e2dfa0 from https://github.com/iamkun/dayjs. npm ci was not usable because this historical commit has no lockfile; npm install --no-audit --no-fund completed. After applying the Multi-SWE-bench test_patch, npx jest test/plugin/updateLocale.test.js --runInBand --coverage=false failed as expected: benchmark-added test expected bad date and received Invalid Date. After applying the benchmark fix_patch, the same command passed with 5 tests. Metadata validation passed with a Bun YAML parse/assert script: 3 tasks and 6 rejected repositories.","created_at":"2026-06-04T03:14:32Z"},{"id":27,"issue_id":"av-83h","author":"entity","text":"Final integration state: scratch branch commit was rewritten to 137b5ccd so it contains only swe-evals task-pack files and no .beads mutation. Shared integration checkout /home/entity/projects/EntityProcess/agentv.worktrees/public-demo-integration on feature/agentv-public-demo now has commit 182c5aa3 (docs(public-demo): freeze swe task pack), also containing only swe-evals task-pack files. Beads coordination updates were made from primary checkout /home/entity/projects/EntityProcess/agentv per epic rule.","created_at":"2026-06-04T03:15:41Z"}]} -{"id":"av-9fk","title":"public demo: build swe-evals harness","description":"Plan: docs/plans/public-agentv-demo-projects.md#u2-build-swe-evals-harness-project\nRequirements: R12, R13, R14, R15, R16, R18\n\nAcceptance:\n- Create swe-evals AgentV config, eval YAML, scripts, .env.example, README, and runtime variant setup for baseline, compound-engineering, and superpowers.\n- All variants start from the same selected previous commit for each task.\n- AGENT_TARGET or equivalent switches Codex/Pi without editing eval YAML.\n- External repo install/test commands use pinned commits, reviewed verification commands, and minimal environment; provider/result/BWS secrets are not inherited unless explicitly required.\n- Run validation/dry-run, then one real provider smoke when env is configured.\n- Record Dashboard UX or AgentV core/schema/result-format gaps as separate follow-up Beads.","status":"closed","priority":1,"issue_type":"task","assignee":"codex-public-demo-plan","created_at":"2026-06-04T02:16:12.159722031Z","created_by":"codex-public-demo-plan","updated_at":"2026-06-04T10:40:32.240352161Z","closed_at":"2026-06-04T10:29:46.331410648Z","close_reason":"Completed: swe-evals sibling repo committed and pushed to https://github.com/EntityProcess/swe-evals.git at 5a47b59f91482d25dfcdd73d2f002e6342f2ccbc; verification evidence recorded in comments.","source_repo":"agentv","source_repo_path":"/home/entity/projects/EntityProcess/agentv","compaction_level":0,"original_size":0,"labels":["harness","public-demo","swe-evals"],"dependencies":[{"issue_id":"av-9fk","depends_on_id":"av-83h","type":"blocks","created_at":"2026-06-04T02:16:12.511748035Z","created_by":"codex-public-demo-plan","metadata":"{}","thread_id":""}],"comments":[{"id":9,"issue_id":"av-9fk","author":"codex-public-demo-plan","text":"Created from doc review handoff. Requirements: docs/brainstorms/2026-06-04-public-agentv-demo-projects-requirements.md. Plan: docs/plans/public-agentv-demo-projects.md. Follow-up rule: Dashboard UX gaps and AgentV core gaps discovered during implementation should become separate focused Beads with evidence.","created_at":"2026-06-04T02:16:45Z"},{"id":14,"issue_id":"av-9fk","author":"codex-public-demo-plan","text":"Agent Mail broadcast attempted by IvoryDune on thread public-agentv-demo-projects. Delivery was blocked by contact policy for CoralGlen and QuietCove; pending contact requests were created by the Agent Mail server. Broadcast body summarized plan docs, claimed Beads, repo topology, Dashboard UX-gap follow-up rule, AgentV core-gap follow-up rule, secret handling, and result-sync artifact boundary.","created_at":"2026-06-04T02:19:02Z"},{"id":26,"issue_id":"av-9fk","author":"entity","text":"Handoff from task-pack bead av-83h: consume swe-evals/tasks/dayjs-v1.yaml from integration branch feature/agentv-public-demo. Build harness without changing selected tasks unless validation fails. Use disposable checkout per task at previous_commit, apply the Multi-SWE-bench test_patch, run the focused Jest command as the first fail-to-pass grader signal, and keep baseline/compound-engineering/superpowers variants on identical previous commits. Important setup note: validated Day.js base commit lacks package-lock.json, so use npm install --no-audit --no-fund in isolated workspaces rather than npm ci. Keep provider/result/BWS secrets out of repo files and out of subprocess environments unless explicitly required. If AgentV cannot express this metadata/workspace lifecycle with existing primitives, create a focused follow-up Bead instead of expanding harness scope.","created_at":"2026-06-04T03:14:32Z"},{"id":32,"issue_id":"av-9fk","author":"BlackMeadow","text":"bead-spawn-agent launched an agent for av-9fk.\n\nSession: agent-av-9fk-main-20260604054755\nDirectory: /home/entity/projects/EntityProcess/agentv.worktrees/public-demo-integration\nProfile: codex-eng (auto-detected if not specified)\n\nExported EP_TASK_ID, BEAD_ID, and AGENTV_BEAD_ID as av-9fk.\nBeads coordination checkout: /home/entity/projects/EntityProcess/agentv\nWorktree: /home/entity/projects/EntityProcess/agentv.worktrees/public-demo-integration","created_at":"2026-06-04T03:47:55Z"},{"id":33,"issue_id":"av-9fk","author":"BlackMeadow","text":"bead-spawn-agent launched an agent for av-9fk.\n\nSession: agent-av-9fk-main-20260604054933\nDirectory: /home/entity/projects/EntityProcess/agentv.worktrees/public-demo-integration\nProfile: codex-eng (auto-detected if not specified)\n\nExported EP_TASK_ID, BEAD_ID, and AGENTV_BEAD_ID as av-9fk.\nBeads coordination checkout: /home/entity/projects/EntityProcess/agentv\nWorktree: /home/entity/projects/EntityProcess/agentv.worktrees/public-demo-integration","created_at":"2026-06-04T03:49:33Z"},{"id":39,"issue_id":"av-9fk","author":"entity","text":"Implemented swe-evals Day.js harness in shared integration worktree: eval YAML with baseline/compound-engineering/superpowers runtime aliases delegated through AGENT_TARGET, reviewed Multi-SWE-bench test patches, setup/grading scripts with minimal child-process env, .env.example, agentv.config.ts, workspace template, runtime variant instructions, and README. Validation: built @agentv/core and @agentv/eval after bun install; typechecked swe-evals TS scripts/config; biome check swe-evals passed; validate-example-evals passed for existing examples; full dry-run passed harness execution with 9/9 execution_status=ok using AGENT_TARGET=codex LLM_TARGET=azure GRADER_TARGET=azure bun apps/cli/src/cli.ts eval swe-evals/evals/dayjs-v1.eval.yaml --dry-run --threshold 0. Dry-run scores are expected 0 because mocked provider does not fix Day.js while code grader runs real focused Jest red checks. Live provider smoke skipped: worktree has no .env configured.","created_at":"2026-06-04T04:22:25Z"},{"id":50,"issue_id":"av-9fk","author":"entity","text":"Migration resumed per user clarification: swe-evals is now a separate sibling git repo at /home/entity/projects/EntityProcess/swe-evals. No existing sibling repo was found, so initialized a new repo on main and copied the preserved harness artifacts from /home/entity/projects/EntityProcess/agentv.worktrees/public-demo-integration/swe-evals without deleting the integration copy. Added sibling-local package.json, .gitignore, and .agentv/targets.yaml with env-placeholder Codex/Pi/Azure targets; adjusted README commands for ../agentv CLI path and converted agentv.config.ts to a plain config object so the repo only needs local @agentv/eval. Verification in sibling repo: bun install passed; bun run typecheck passed; bun run lint passed; focused AgentV dry-run passed with 3/3 execution_status=ok using AGENT_TARGET=codex GRADER_TARGET=azure bun ../agentv/apps/cli/src/cli.ts eval evals/dayjs-v1.eval.yaml --test-id dayjs-year-format-leading-zeroes --dry-run --threshold 0. Manifest .agentv/results/runs/default/2026-06-04T09-26-31-009Z/index.jsonl has 3 rows, statuses ok, score type code-grader, target codex-dry-run. Live provider smoke still skipped: no real provider env configured. No commits made; integration worktree copy intentionally left in place for now.","created_at":"2026-06-04T09:29:55Z"},{"id":52,"issue_id":"av-9fk","author":"entity","text":"Finalized swe-evals sibling repo. Initial commit: 5a47b59f91482d25dfcdd73d2f002e6342f2ccbc (feat: add Day.js SWE eval harness). Created and pushed remote: https://github.com/EntityProcess/swe-evals.git, public repo, default branch main; local main tracks origin/main. Verification in /home/entity/projects/EntityProcess/swe-evals before commit/push: bun install passed; bun run typecheck passed; bun run lint passed; focused dry-run passed with AGENT_TARGET=codex GRADER_TARGET=azure bun ../agentv/apps/cli/src/cli.ts eval evals/dayjs-v1.eval.yaml --test-id dayjs-year-format-leading-zeroes --dry-run --threshold 0. Latest manifest evidence from .agentv/results/runs/default/2026-06-04T09-26-31-009Z/index.jsonl: 3 rows, execution_status ok, score type code-grader, target codex-dry-run. Live provider smoke skipped because no real provider env was configured. No unrelated AgentV dashboard-run-management changes touched; the old integration worktree copy remains dirty but was not modified for this finish step.","created_at":"2026-06-04T10:29:30Z"},{"id":53,"issue_id":"av-9fk","author":"entity","text":"Post-closeout cleanup completed. Durability confirmed: sibling repo /home/entity/projects/EntityProcess/swe-evals tracks origin/main at 5a47b59f91482d25dfcdd73d2f002e6342f2ccbc, and GitHub tree for https://github.com/EntityProcess/swe-evals includes the migrated harness files (.agentv/targets.yaml, evals/dayjs-v1.eval.yaml, patches/, runtime-variants/, scripts/, tasks/dayjs-v1.yaml, workspace-template/, package/bun lock/config/docs). Removed the legacy AgentV integration worktree copy because swe-evals is now a separate repo: deleted tracked AgentV seed files swe-evals/README.md, swe-evals/tasks/README.md, swe-evals/tasks/dayjs-v1.yaml; removed untracked migrated harness files under swe-evals/ (.env.example, agentv.config.ts, evals/, patches/, runtime-variants/, scripts/, workspace-template/) and generated swe-evals/.agentv artifacts from disk; removed the swe-evals/.agentv/ ignore entry from AgentV .gitignore. Preserved unrelated AgentV changes: existing .gitignore .grepai/ line, unrelated dexter-evals deletions, and other ignored/generated AgentV state were not touched. av-9fk remains closed; this comment records the additional closeout requirement.","created_at":"2026-06-04T10:39:00Z"},{"id":55,"issue_id":"av-9fk","author":"entity","text":"Final handoff after additional closeout: confirmed sibling repo durability on GitHub (EntityProcess/swe-evals main at 5a47b59f91482d25dfcdd73d2f002e6342f2ccbc, tree contains migrated harness content). AgentV integration cleanup performed only for legacy swe-evals copy: path /home/entity/projects/EntityProcess/agentv.worktrees/public-demo-integration/swe-evals is absent from disk; tracked deletions now show swe-evals/README.md, swe-evals/tasks/README.md, and swe-evals/tasks/dayjs-v1.yaml because those seed files now live in the separate swe-evals repo. Removed generated/untracked migrated swe-evals harness content from the integration worktree as described in prior comment. Preserved unrelated AgentV state: .gitignore still has the preexisting .grepai/ change; unrelated dexter-evals/dashboard-run-management changes were not touched; did not remove the shared integration worktree because it contains unrelated dirty work. No Agent Mail identity/reservations were registered or created by this cleanup turn, and no Agent Mail MCP cleanup tool is exposed here. Outstanding owned resource: tmux session agent-agentv-public-demo-swe-harness-9fk-main-20260604054933 exists and will be killed immediately after this Beads note.","created_at":"2026-06-04T10:40:32Z"}]} -{"id":"av-agy","title":"fix(dashboard): preserve remote result context on run detail pages","description":"Dogfood evidence from WTG.AI.Prompts remote sync on 2026-06-06.\\n\\nObservable behavior:\\n- `/projects/wtg-ai-prompts/runs/remote%3A%3Asmoke-wtg-2026-06-04T02-19-00Z` loads and the API reports `source: remote`, `source_label: smoke-wtg-2026-06-04T02-19-00Z`, `results.length: 1`.\\n- The detail page heading is just `smoke`; it does not clearly say this is a remote result or identify `WiseTechGlobal/WTG.AI.Prompts.EvalResults`.\\n- Category breakdown displayed `../../../../../tmp` for the smoke artifact, which is technically from artifact metadata but reads as path leakage/noise in the UX.\\n\\nAcceptance:\\n- Remote run detail pages show a clear source badge/context (Remote, source label, repo when available).\\n- Category/suite labels derived from artifact paths are normalized or de-emphasized so path traversal-like labels are not the primary user-facing category.\\n- Local run detail pages remain unchanged except for any shared layout improvements.\\n- Add a fixture with a remote run and odd relative category path to prevent regression.","status":"closed","priority":2,"issue_type":"bug","assignee":"entity","created_at":"2026-06-06T05:15:12.093514366Z","created_by":"entity","updated_at":"2026-06-06T22:28:10.885595323Z","closed_at":"2026-06-06T22:28:10.885459060Z","close_reason":"Merged via PR #1312 (fix(dashboard): preserve remote run detail context).","source_repo":"agentv","source_repo_path":"/home/entity/projects/EntityProcess/agentv","compaction_level":0,"original_size":0,"labels":["dashboard","remote-sync","results","ux"],"comments":[{"id":142,"issue_id":"av-agy","author":"entity","text":"Launching NTM-managed Codex worker after tmux recovery cleanup. Session: agentv-av-agy-remote-detail. Implementation checkout: /home/entity/ntm_Dev/agentv-av-agy-remote-detail on branch feature/av-agy-remote-run-detail-context. Coordination checkout for br only: /home/entity/projects/EntityProcess/agentv. Repo focus: EntityProcess/agentv. Remote run detail context. Monitor with: ntm status agentv-av-agy-remote-detail; ntm view agentv-av-agy-remote-detail.","created_at":"2026-06-06T14:28:11Z"},{"id":168,"issue_id":"av-agy","author":"entity","text":"Implemented and pushed fix for remote run detail context.\n\nBranch: feature/av-agy-remote-run-detail-context\nCommit: e3f00ecc fix(dashboard): preserve remote run detail context\n\nWhat changed:\n- Added run detail header/context helpers and regression fixture test for a remote smoke run with category ../../../../../tmp.\n- Remote detail routes now show the remote source label as the heading, a Remote badge, and the configured results repo when available from remote status.\n- Category breakdown keeps raw category values for filtering, but traversal-like categories display a normalized basename as primary label with the raw path muted underneath.\n\nVerification:\n- bun test apps/dashboard/src/lib/run-detail-context.test.ts\n- bun test apps/dashboard/src/lib/*.test.ts apps/dashboard/src/components/*.test.ts\n- bun --filter @agentv/dashboard build\n- bun --filter @agentv/core build (needed for source CLI UAT server)\n- bun run test\n- Push pre-hook reran typecheck + biome check successfully.\n\nManual red/green UAT:\n- Fixture: /tmp/agentv-av-agy-uat.luaOD6, route /runs/remote%3A%3Asmoke-wtg-2026-06-04T02-19-00Z.\n- Red origin/main (port 43119): visible text showed heading codex, meta ended with remote, no Remote badge/repo context, and category primary label ../../../../../tmp. Screenshot: /tmp/agentv-av-agy-uat.luaOD6/red-origin-main.png.\n- Green current branch (port 43118): visible text shows heading smoke-wtg-2026-06-04T02-19-00Z, Remote badge, Repo: WiseTechGlobal/WTG.AI.Prompts.EvalResults, and category primary label tmp with ../../../../../tmp muted below. Screenshot: /tmp/agentv-av-agy-uat.luaOD6/green-current-branch.png.\n\nCurrent green fixture server remains available at http://localhost:43118/runs/remote%3A%3Asmoke-wtg-2026-06-04T02-19-00Z for quick review.","created_at":"2026-06-06T15:48:53Z"}]} -{"id":"av-ams","title":"feat(dashboard): make remote sync outcome explicit","description":"Dogfood evidence from WTG.AI.Prompts remote sync on 2026-06-06.\\n\\nObservable behavior:\\n- Remote status API returns repo, last_synced_at, and run_count.\\n- Toolbar shows repo and last synced in low-emphasis text, but not remote run count.\\n- Clicking Sync Remote Results changes the button to Syncing..., then silently returns to the same state. There is no success confirmation, no changed count, and no visible failure path unless last_error appears.\\n\\nPlan:\\n- Keep the existing toolbar primitive; add concise status text using existing RemoteStatusResponse: remote run count, last synced, repo.\\n- After sync resolves, show a transient success state such as Synced 1 remote run at