Skip to content

feat: branch test runner data layer + parallel orchestration (FEIP-7092)#48

Open
kevin-hartman wants to merge 18 commits into
mainfrom
kevin.hartman/feip-7092-branch-test-runner
Open

feat: branch test runner data layer + parallel orchestration (FEIP-7092)#48
kevin-hartman wants to merge 18 commits into
mainfrom
kevin.hartman/feip-7092-branch-test-runner

Conversation

@kevin-hartman
Copy link
Copy Markdown
Collaborator

@kevin-hartman kevin-hartman commented May 28, 2026

Summary

Foundation work for FEIP-7092 (branch test runner) plus collateral substrate hardening surfaced during a real-Lakebase end-to-end exercise of the surface.

Original scope (4 slices)

  • Per-tag outcomes shape on ExperimentOutcomes (api / e2e / infra → {passed, failed}); backwards compatible — totals stay authoritative.
  • Per-cycle artifact persistence: writeArtifact / listArtifacts / readArtifact over .tdd/experiments/<F>/<exp>/artifacts/<cycle-id>/<name>. Path-traversal guarded.
  • Bounded-concurrency parallel runner: runExperimentsInParallel<T> with hard concurrency cap, fail-isolation per experiment, peak-in-flight tracking, per-experiment duration measurement.
  • Structured compareExperiments payload: per-row by_tag + cycle_count + artifact_count + duration_ms; new matrix (tag × experiment cells) ready for FEIP-7208's renderer.

Collateral substrate hardening (commit bdfc40a)

Three substrate gaps surfaced when this PR's surface was exercised end-to-end against a real Lakebase project. All are pre-existing concerns hermetic tests didn't catch; live list-branches / create-branch responses exposed them.

Branch identifier (uid vs name) confusion. A Lakebase branch has TWO identifiers and they are NOT interchangeable in the API. getDefaultBranchId violated the kit's own contract by preferring def.uid over the leaf of def.name, causing a confusing "branch id not found" error from the service. Fixes:

  • new scripts/lakebase/branch-id.ts: branded BranchName / BranchUid types + asBranchName / asBranchUid validators + branchNameFromResourcePath leaf extractor
  • getDefaultBranchIdgetDefaultBranchName, returning BranchName | null derived from leaf-of-name (never uid); old name kept as @deprecated shim that also returns the name string
  • LakebaseBranchInfo.uid typed BranchUid; new nameLeaf: BranchName field
  • runtime guard in createBranch.parentBranch rejects uid-shaped input with a typed LakebaseBranchError
  • parseBranch funnels through the validators

Workspace TTL policy. Some workspaces enforce a tighter maximum-expiration policy than the PSA convention's 30-day feature TTL. Substrate now wraps the CLI's opaque "expiration time exceeds the maximum expiration time" error with a typed LakebaseBranchTtlTooLongError naming the attempted TTL + the override paths.

create-project.test.ts hardcoded placeholder host. The E2E test fell back to https://workspace.cloud.databricks.com (DNS fails) when LAKEBASE_TEST_HOST was unset. Added a resolveTestHost() helper that prefers the explicit env, then databricks auth env --profile $DATABRICKS_CONFIG_PROFILE. Documented skip block parallels the existing skip-when-e2e-disabled. Same test: .env contract updated to match the substrate's deliberate deployEnv() seeding (avoids the gated-hook chicken-and-egg).

Per-test timeouts on 2 live-pg-pair tests. queryBranchSchema(uid) === queryBranchSchema(branchId) and get-connection-equivalence each do two live pg connections back-to-back; the 5s vitest default is tight under parallel-suite load. Bumped to 30s with comments — legitimate per-test budget, not a workaround.

Slice breakdown

Slice Surface Commit
1 ExperimentOutcomes.by_tag + ExperimentTag / TagOutcome types 5c72108
2 scripts/tdd/artifacts.ts (new) — write/list/read primitives 35a545c
3 scripts/tdd/parallel-runner.ts (new) — bounded concurrency 9321c5c
4 compareExperiments structured payload (rows + matrix) 0a57f40
Collateral branch-id branded types + workspace-TTL handling bdfc40a

Live verification

Substrate exercised end-to-end against a real Lakebase project on a workspace that imposes a tighter TTL policy than the convention. After the fixes:

  • 3 fresh Lakebase projects provisioned across the session. 2 child branches cut in parallel via runExperimentsInParallel, peak_in_flight=2, wall 7.5s.
  • compareExperiments matrix populated across all 3 tags: [api] both pass 3p/0f; [e2e] exp-pg wins 1p/0f vs exp-json 1p/1f; [infra] exp-pg has data, exp-json null. signal=winning correctly inferred for exp-pg.
  • Artifacts persisted (Playwright trace + vitest junit per experiment, listed via listArtifacts).
  • recommendation=continue correct given 1 winning + 1 running.

All 3 test projects deleted after verification — no leaked resources.

Test plan

  • npm run typecheck clean
  • Hermetic suite: 403 → 404 passed, 44 skipped, 0 failed
  • Live env-gated suite (LAKEBASE_TEST_INSTANCE=… LAKEBASE_TEST_E2E=1): 432 passed, 16 skipped, 0 failed (+29 over hermetic)
  • npm run build clean
  • Manual: live driver against real Lakebase exercises slices 1-4 end-to-end

Composition with sibling PRs / tickets

  • FEIP-7094 (Playwright E2E end-to-end): will populate by_tag.e2e from the Driver's tag-aware runner. Slice 1's shape is the contract that the Driver writes to.
  • FEIP-7208 (comparison-report renderer): consumes the matrix + per-row by_tag + artifact_count to render the promote-vs-synthesize HITL decision aid.
  • FEIP-7215 (lakebase-feature-status, PR feat: lakebase-feature-status bin + MCP tool (FEIP-7215) #47): can surface per-tag info once it merges + this lands; minor follow-up to enhance the renderer.

This pull request and its description were written by Isaac.

Adds ExperimentTag = "api" | "e2e" | "infra" and TagOutcome { passed, failed }.
ExperimentOutcomes gains an optional by_tag field — Partial<Record<ExperimentTag,
TagOutcome>>. Top-level tests_passed / tests_failed remain authoritative
totals; by_tag is a breakdown for downstream renderers (comparison-report
renderer in FEIP-7208, feature-status snapshot in FEIP-7215) and the per-tag
smell detectors (FEIP-7094's e2e-row-perma-red).

Backwards compatible: existing callers reading the prior shape see no change.
The breakdown is partial-reportable (api but not e2e is valid) and the sum
across tags is not enforced to match the totals (mid-cycle reporting and
untagged tests both valid).

Tests
- tests/bdd/tdd-experiment-lifecycle.test.ts: 3 new round-trip tests covering
  full breakdown, partial breakdown, by_tag-omitted backwards compat.

Co-authored-by: Isaac
New scripts/tdd/artifacts.ts module with writeArtifact / listArtifacts /
readArtifact over .tdd/experiments/<F>/<exp>/artifacts/<cycle-id>/<name>.
Names may include subdirs (e.g. "traces/network.har"); intermediate dirs
created on demand. Path-traversal guarded: absolute paths and ".." rejected.

The orchestrator writes here after every cycle (Playwright traces, vitest
junit output, screenshots, repro scripts). The comparison-report renderer
(FEIP-7208) reads listings to surface what is available when the PO is
deciding promote vs synthesize.

Gitignored by default in scaffolded projects: artifacts can be large and
rebuilding them from logs is cheap. The scaffold step that writes the
project's .gitignore is owned by lakebase-create-project, not this module.

Tests
- tests/bdd/tdd-artifacts.test.ts: 10 tests covering write, list (scoped +
  cross-cycle), read round-trip, security guards (path traversal, absolute
  paths), empty cases, and entry shape stability.

Co-authored-by: Isaac
New scripts/tdd/parallel-runner.ts: runExperimentsInParallel<T> schedules
N experiment runs against an injected runner callback, honoring a hard
concurrency cap sourced from .tdd/features/<F>/plan.json
budget.concurrent_branches by the orchestrator.

Failure in one experiment does NOT abort the others. Each runner invocation
is try/catch-isolated and produces an ExperimentRunResult (succeeded or
failed) in the aggregate output. Tracks peak_in_flight observed and
per-experiment duration_ms.

The runner is injected so tests stay hermetic (no real Lakebase calls).
The orchestrator's real runner integrates with experiment.ts + run-cycle.ts
to drive an actual TDD cycle; that integration is the next slice past this
ticket and will compose on top.

Tests
- tests/bdd/tdd-parallel-runner.test.ts: 8 tests covering input completeness,
  concurrency cap honored (parallelization observed via peak_in_flight),
  failure isolation, duration tracking, empty-list short-circuit,
  concurrency=1 forces sequential mode, rejection of concurrency < 1,
  context pass-through to the runner.

Co-authored-by: Isaac
Extends ComparisonReport for the FEIP-7208 comparison-report renderer to
consume:

- Each ExperimentRow gains by_tag (the per-tag breakdown surfaced through
  ExperimentOutcomes in slice 1), cycle_count (count of entries in the
  experiment's timeline.json), artifact_count (from listArtifacts in slice 2),
  and duration_ms (from runtime.json when written by the orchestrator).
- New matrix: TagMatrixRow[] — one row per tag any experiment reported,
  with per-experiment cells. Cell is TagOutcome { passed, failed } when the
  experiment reported the tag, null when it did not. Tags ordered api → e2e
  → infra. Empty when no experiment recorded per-tag outcomes (early-stage
  race or projects that do not use the tag-aware runner yet).
- Existing fields (rows, recommendation, rationale) unchanged. Backwards
  compatible with the current ComparisonReport consumers.

Tests
- tests/bdd/tdd-compare-experiments.test.ts: 6 new tests covering cycle_count
  population (0 fallback when timeline missing), artifact_count population,
  duration_ms when runtime.json is written, by_tag pass-through from
  outcomes, full matrix shape across 3 tags with null cells for missing
  data, empty matrix when no experiment uses tags.

Co-authored-by: Isaac
…urfaced during FEIP-7092 live E2E)

The live exercise of FEIP-7092's substrate against a real Lakebase
project surfaced three related substrate gaps. All three are pre-
existing concerns that hermetic tests didn't catch; the live run with
real list-branches + create-branch responses exposed them. This commit
hardens the affected seams + adds hermetic coverage so they stay caught.

### 1. BranchName vs BranchUid distinction (the headline fix)

A Lakebase branch has TWO identifiers and they are NOT interchangeable:
  - BranchName: the resource-path leaf (`production`, `feature-x`). The
    Lakebase API requires this in every path-shaped field — source_branch
    in create-branch specs, `{branch}` segments in subresource URLs.
  - BranchUid: the system-assigned `br-…` form. Returned in the `uid`
    field of list-branches. Used only for direct uid lookups.

Passing a BranchUid where a BranchName is expected fails with a confusing
"branch id not found" error from the service. `getDefaultBranchId`
violated the kit's own contract (LakebaseBranchInfo.uid's docstring is
explicit about uids NOT being accepted in path fields) by preferring
`def.uid` over the leaf of `def.name`.

Fixes:
  - new scripts/lakebase/branch-id.ts: branded types BranchName /
    BranchUid + asBranchName / asBranchUid runtime validators +
    branchNameFromResourcePath leaf extractor. Both validators throw
    with typed, actionable messages on a swap attempt.
  - getDefaultBranchId renamed to getDefaultBranchName. Returns
    BranchName | null derived from leaf-of-name, never from uid. The
    old name is kept as a thin @deprecated shim that now also returns
    the name (not the uid) so any transitional caller is unblocked.
  - LakebaseBranchInfo.uid typed as BranchUid; new nameLeaf: BranchName
    field for direct consumption. parseBranch funnels through the
    validators so the kit's view of a branch is always brand-correct.
  - createBranch's parentBranch arg gets a runtime guard via asBranchName
    that throws LakebaseBranchError immediately if a uid-shaped value
    is passed — surfaces the swap at the boundary instead of letting
    the CLI's "branch id not found" bubble up.
  - findDefaultBranchName helper extracted as pure for hermetic testing.

### 2. Workspace TTL policy

CONVENTION_TIER_DEFAULTS.feature defaults to 30-day TTL ("2592000s"),
matching the documented PSA branching convention. Some workspaces
enforce a tighter maximum-expiration policy (the test workspace caps
somewhere between 14 and 30 days). The substrate previously surfaced
this as an opaque "expiration time exceeds the maximum expiration time"
error from the CLI.

Fix: new LakebaseBranchTtlTooLongError (extends LakebaseBranchError),
wrapped by createBranch when it detects the specific stderr signal.
Message names the attempted TTL, the override options (shorter ttl arg
OR noExpiry: true), and the history_retention_duration probe path.
CONVENTION_TIER_DEFAULTS docstring updated to call out the caveat.

### 3. create-project live test hardcoded placeholder host

create-project.test.ts's E2E test fell back to
`https://workspace.cloud.databricks.com` (placeholder) when LAKEBASE_TEST_HOST
was unset. That host DNS-fails on every workspace — the test had been
silently unrunnable for anyone using DATABRICKS_CONFIG_PROFILE only.

Fix: added a resolveTestHost() helper that prefers LAKEBASE_TEST_HOST,
falls back to running `databricks auth env --profile $DATABRICKS_CONFIG_PROFILE`
to extract DATABRICKS_HOST, and returns null (so the describe block
skips cleanly with a documented reason) when neither is set. Documented
skip block parallels the existing skip-when-e2e-disabled.

Same test: contract update on the `.env` assertion. scaffold.ts's
deployEnv() deliberately seeds .env on create-project to avoid a
gated-hook chicken-and-egg (post-checkout bails on empty
LAKEBASE_PROJECT_ID). Test now asserts .env DOES exist with the
expected seeded fields; comment updated to match the substrate's
documented design.

### 4. Per-test timeout bumps on two live-pg-pair tests

`branch-identifier > queryBranchSchema(uid) === queryBranchSchema(branchId)`
and `get-connection-equivalence > returns identical current_database()/host
across both output shapes` each do TWO live pg connections back-to-back.
Each connection takes ~3-4s end-to-end (credential mint + TLS + query).
The 5s vitest default per-test budget was tight enough that under
parallel-suite load the pair routinely timed out, even though neither
substrate path is actually slow. Bumped both to 30000ms with comments
explaining the budget (not a workaround — same fix pattern as the
MCP handshake test).

Tests
- new tests/bdd/branch-id.test.ts: 14 hermetic tests covering both
  validators (positive + negative), leaf extractor, and @ts-expect-error
  type-level swap assertions
- new tests/bdd/branch-ttl-error.test.ts: 7 tests covering the
  CLI-stderr detector (case-insensitive, defensive against rewording)
  and the typed error contract (extends LakebaseBranchError, names
  the override paths, includes the underlying trace)
- tests/bdd/lakebase-project.test.ts: +6 regression tests against the
  exact CLI list-branches response shape. The headline assertion guards
  the bug: "returns 'production' not 'br-crimson-fire-d28lb2ez'"
- 396 → 403 passing in the hermetic suite (+22 net). Live env-gated
  suite goes 403 → 432 passing (+29 net) when LAKEBASE_TEST_INSTANCE
  is set against a real project.

Co-authored-by: Isaac
… LIVE gate

The live test body REQUIRES LAKEBASE_TEST_PROJECT_PATH (throws "required
for live test" otherwise). The describe block's LIVE gate previously
accepted just LAKEBASE_TEST_E2E=1 + DATABRICKS_HOST, so a run with E2E=1
but no PROJECT_PATH would enable the describe and then the test body
would throw — surfacing as a hard FAIL instead of a clean SKIP.

Surfaced during the full live-suite run via scripts/run-all-live-tests.sh:
the kit's auto-provision flow sets E2E + HOST but does not set
PROJECT_PATH (which refers to a Databricks workspace path the test
would normally treat as a "this project is already published in this
workspace" pointer). Tightening the gate keeps the test correctly
skipped in that scenario while preserving its full assertion set when
all three env vars are supplied together.

Co-authored-by: Isaac
…d suite

New script that orchestrates a clean live-test run end-to-end:

  scripts/run-all-live-tests.sh --profile <name>

Differs from the existing scripts/run-live-tests.sh:

  - Auto-resolves DATABRICKS_HOST from a `databricks` CLI profile (no
    need for the user to export it separately).
  - Auto-provisions a fresh Lakebase project on demand (with a 5s
    creation-grace prompt; --no-prompt for CI) and resolves its default
    branch via list-branches. The default-branch leaf is used as both
    LAKEBASE_TEST_BRANCH and LAKEBASE_TEST_PARENT unless overridden.
  - Sets EVERY env var the kit's tests gate on (the LAKEBASE_TEST_*
    suite, LAKEBASE_TEST_INITIALIZR, PEER_DEP_INTEGRATION, plus
    LAKEBASE_TEST_HOST so the create-project test fixture's profile-
    resolver finds the host).
  - Defaults to LAKEBASE_TEST_NO_TEARDOWN=1 so the substrate's
    never-teardown-on-failure convention holds. The --teardown flag
    opts in to post-run cleanup, and even then only on a fully-green
    run (the script preserves the project on any non-zero exit).

Usage:

  scripts/run-all-live-tests.sh --profile <name>            # provision + run
  scripts/run-all-live-tests.sh --profile <name> --project <id>   # reuse
  scripts/run-all-live-tests.sh --profile <name> --teardown       # cleanup on green
  scripts/run-all-live-tests.sh --profile <name> --no-prompt      # CI

Not unlocked by this script (separate setup required):
  - detect-language-via-self-hosted-runner.test.ts (needs a self-hosted
    GitHub Actions runner registered).
  - LAKEBASE_TEST_PROJECT_PATH-gated TDD live tests (need a Databricks
    workspace path; gated test now skips cleanly thanks to the
    sibling tdd-experiment-lifecycle LIVE-gate commit).

Co-authored-by: Isaac
…LT_DATABASE / DEFAULT_ENDPOINT

Previously each of three repeated literals had multiple inline copies
across the substrate:
  - 5432              in 4 files (branch-schema, get-connection ×2, paired-branch)
  - "databricks_postgres" in 4 files (branch-schema, get-connection, paired-branch ×3, credentials.ts test helper)
  - "primary"         in 4 callsites (branch-endpoint ×3, get-connection)

Drift across copies is exactly the failure mode this refactor prevents:
change one, miss the others, the substrate quietly fragments. Now every
callsite imports from scripts/lakebase/constants.ts.

No behavior change — each per-call override (`database` arg, PGDATABASE
env, `endpointName` arg, etc.) still wins over the default. The constant
only fixes the fallback.

Tests
- New tests/bdd/constants.test.ts: pins the documented values + types
- Updated tests/bdd/get-connection-{dsn,pool}.test.ts to assert against
  POSTGRES_PORT / DEFAULT_DATABASE instead of inline literals
- Updated tests/bdd/credentials.ts (the shared live-env helper) to
  fall back to DEFAULT_DATABASE when LAKEBASE_TEST_DATABASE is unset
- Hermetic suite: 404 → 407 (+3 from the new constants pinning tests),
  0 regressions

Surfaced during the parameterization audit (commit follows: run-all-live-tests.sh flags; convention parent-branch fallback; KIT_TIMEOUTS).

Co-authored-by: Isaac
Three hardcoded values in the script become proper flags + a portability
fix for --help on macOS:

  --project-prefix <prefix>    default "live-all-" (was inline)
  --grace-seconds <n>          default 5 (was `sleep 5` literal)
  --database <name>            default unset → substrate's DEFAULT_DATABASE
                               (no more duplicated "databricks_postgres"
                               fallback in the script — the substrate's
                               constants.ts is the single source of truth)

Plus a portability fix: --help previously used `head -n -1` (GNU-only).
macOS BSD head errors with "illegal line count -- -1". Switched to
`sed '$d'` which works on both. The fix is unrelated to the flag work
but the bug surfaced when I sanity-checked --help during this refactor;
worth landing now rather than as a separate ticket.

Validation:
- `bash -n scripts/run-all-live-tests.sh` — clean
- `bash scripts/run-all-live-tests.sh --help` — renders correctly on macOS
- `--grace-seconds abc` rejected at parse time with a typed error
- Default invocation behavior unchanged (live-all-<ts>, 5s grace, no
  LAKEBASE_TEST_DATABASE export → substrate fallback)

Co-authored-by: Isaac
…issing

Substrate behavior change: when createBranch is given an explicit
parentBranch (typically via CONVENTION_TIER_DEFAULTS.feature.parentBranch
= "staging", etc.) and the named branch does NOT exist on the project,
the substrate now falls back to the project's default branch with a
stderr warning. Opt OUT via the new `strictParent: true` flag to
restore throw-on-missing for hotfix-from-production paths where the
lineage MUST match the caller's expectation.

Why
---
Surfaced during the FEIP-7092 live exercise: tdd-synthesis cuts a fresh
branch via cutExperiment → createFeatureBranch, which defaults parentBranch
to "staging". Bare-provisioned Lakebase projects (via `databricks postgres
create-project`) ship with only `production` — no `staging`. The API
then returned the opaque "branch id not found" error.

Previous attempt was to bootstrap the PSA topology (cut a staging branch
off production in scripts/run-all-live-tests.sh). That hardcoded
"staging" + "production" in a script-level workaround on top of a
substrate assumption. This commit fixes it at the substrate seam instead:
the assumption (parentBranch exists) is now verified, and the failure
mode is a clean fallback with a visible warning, not an opaque CLI error.

Behavior
--------
  - default: existing parent → use it (no warning)
  - default: missing parent → use project default + stderr warning
  - default: missing parent + no project default → throw with hint
  - strictParent: true + missing parent → throw with hint (no fallback)

`strictParent` is threaded through createFeatureBranch / createTestBranch
/ createUatBranch / createPerfBranch (in convention-branches.ts) so the
convention wrappers can be invoked in strict mode when the lineage
matters for the calling flow.

Tests
-----
New tests/bdd/branch-create-fallback.test.ts: 4 hermetic tests covering
(a) the fallback path emits the documented warning + uses the resolved
source, (b) the happy path uses the named parent + emits NO warning,
(c) strictParent: true throws with the documented message + skips the
default lookup, (d) missing parent + no project default throws with a
distinct message.

Updated tests/bdd/branch-create-collision.test.ts to use a per-name
lookup table (`setupBranchMock`) instead of mockResolvedValue. createBranch
now calls getBranchByName twice per invocation (parent existence check +
target idempotency check), so a single resolveValue would have the parent
lookup return the target branch, breaking the test's setup.

Suite: 407 → 411 (+4), 0 regressions.

Co-authored-by: Isaac
…ridable)

New scripts/lakebase/kit-config.ts exports KIT_TIMEOUTS — a single
source of truth for every timeout the substrate scatters across its
files. Each field is env-overridable so ops folks can tune behavior at
runtime without code changes:

  KIT_TIMEOUTS.cliDefault          (LAKEBASE_KIT_TIMEOUT_CLI_DEFAULT_MS,        default 30_000)
  KIT_TIMEOUTS.cliCreateBranch     (LAKEBASE_KIT_TIMEOUT_CLI_CREATE_BRANCH_MS,  default 60_000)
  KIT_TIMEOUTS.cliCreateEndpoint   (LAKEBASE_KIT_TIMEOUT_CLI_CREATE_ENDPOINT_MS,default 60_000)
  KIT_TIMEOUTS.cliLong             (LAKEBASE_KIT_TIMEOUT_CLI_LONG_MS,           default 60_000)
  KIT_TIMEOUTS.readyWait           (LAKEBASE_KIT_TIMEOUT_READY_WAIT_MS,         default 120_000)
  KIT_TIMEOUTS.readyPoll           (LAKEBASE_KIT_TIMEOUT_READY_POLL_MS,         default 5_000)
  KIT_TIMEOUTS.pgConnect           (LAKEBASE_KIT_TIMEOUT_PG_CONNECT_MS,         default 10_000)
  KIT_TIMEOUTS.pgStatement         (LAKEBASE_KIT_TIMEOUT_PG_STATEMENT_MS,       default 15_000)
  KIT_TIMEOUTS.gitDefault          (LAKEBASE_KIT_TIMEOUT_GIT_DEFAULT_MS,        default 5_000)
  KIT_TIMEOUTS.gitCheckout         (LAKEBASE_KIT_TIMEOUT_GIT_CHECKOUT_MS,       default 10_000)
  KIT_TIMEOUTS.gitNetwork          (LAKEBASE_KIT_TIMEOUT_GIT_NETWORK_MS,        default 15_000)
  KIT_TIMEOUTS.gitPush             (LAKEBASE_KIT_TIMEOUT_GIT_PUSH_MS,           default 30_000)
  KIT_TIMEOUTS.cmdShort            (LAKEBASE_KIT_TIMEOUT_CMD_SHORT_MS,          default 5_000)
  KIT_TIMEOUTS.initializrCacheTtl  (LAKEBASE_KIT_INITIALIZR_CACHE_TTL_MS,       default 600_000)

Files updated (every inline timeout literal across the substrate):
  - branch-create.ts (4 literals: ready wait + poll x2, create-branch CLI)
  - branch-delete.ts (cli default)
  - branch-endpoint.ts (4 literals: list CLI, create CLI, ready wait, poll)
  - branch-schema.ts (pgConnect + pgStatement)
  - branch-utils.ts (cli default)
  - get-connection.ts (cli default)
  - lakebase-project.ts (cli default)
  - paired-branch.ts (10 literals: git ops, push, ready wait x3)
  - runner-setup.ts (cmdShort + cliLong)
  - schema-diff.ts (cli default)
  - spring-initializr.ts (cache TTL)

Per-call args (`readyTimeoutMs`, `pollIntervalMs`, etc.) still win over
the centralized default — the constant only sets the fallback. No
behavior change for callers that didn't override.

Tests
- new tests/bdd/kit-config.test.ts: 8 tests pinning every documented
  default value + the closed shape of the config object + the
  positive-integer invariant (the env-override fallback contract).
- Hermetic suite: 411 → 419 (+8), 0 regressions.

intFromEnv parser is defensive: non-numeric values, ≤0, or
unparseable inputs fall back to the default rather than propagating
NaN through the codebase.

Co-authored-by: Isaac
…try URLs

Two parameterization gaps surfaced during the live-suite exercise. Both
fit naturally in the existing kit-config.ts seam.

(1) Convention branch TTLs
---------------------------
CONVENTION_TIER_DEFAULTS hardcoded its tier TTLs (30d feature, 14d test
+ uat, 7d perf). Workspaces with tighter expiration policies couldn't
override without forking. tdd-synthesis on a workspace with a sub-30-day
cap surfaced the failure mode — LakebaseBranchTtlTooLongError fired with
the correct typed message, but the TTL itself was un-tunable.

KIT_TIMEOUTS now exposes four new fields (env-overridable):

  featureBranchTtlMs  LAKEBASE_KIT_FEATURE_BRANCH_TTL_MS  (default 30d in ms)
  testBranchTtlMs     LAKEBASE_KIT_TEST_BRANCH_TTL_MS     (default 14d)
  uatBranchTtlMs      LAKEBASE_KIT_UAT_BRANCH_TTL_MS      (default 14d)
  perfBranchTtlMs     LAKEBASE_KIT_PERF_BRANCH_TTL_MS     (default 7d)

CONVENTION_TIER_DEFAULTS reads from KIT_TIMEOUTS and formats via the
new exported helper `formatLakebaseTtl(ms)` → `"<seconds>s"` (the
protobuf Duration JSON encoding the Lakebase API expects in create-branch
specs). Per-call `ttl` arg still wins.

(2) Package-registry URLs
-------------------------
Two public registry URLs were hardcoded:
  - https://repo1.maven.org/maven2 in scripts/run-live-tests.sh
    (used to download the Flyway CLI on first live-test run)
  - https://start.spring.io in scripts/lakebase/spring-initializr.ts
    (used to fetch Spring Boot starter projects + metadata)

Both blocked from Databricks-internal dev environments per the
established proxy convention. Now:

  KIT_REGISTRIES.mavenCentral      LAKEBASE_KIT_REGISTRY_MAVEN_CENTRAL
  KIT_REGISTRIES.springInitializr  LAKEBASE_KIT_REGISTRY_SPRING_INITIALIZR

Defaults are the public mainline URLs (the no-config-needed happy path).
For proxied envs, set the env var; the docstring on each KitRegistries
field references the Databricks-internal proxy setup doc. Trailing
slashes are stripped from env input so callers can safely concat
`/path` segments.

Files updated
- kit-config.ts: KitTimeouts gains 4 TTL fields + KIT_REGISTRIES /
  KitRegistries types + formatLakebaseTtl helper + urlFromEnv helper
- convention-branches.ts: CONVENTION_TIER_DEFAULTS reads via KIT_TIMEOUTS
  + formatLakebaseTtl; the local `ttlDays` / `DAY_SECONDS` helpers are
  gone (replaced by the centralized config)
- spring-initializr.ts: DEFAULT_BASE_URL reads from KIT_REGISTRIES
- run-live-tests.sh: MAVEN_CENTRAL is now env-overridable; trailing
  slash stripped via bash parameter expansion (${...%/})
- tests/bdd/kit-config.test.ts: +6 tests covering the new TTL defaults,
  formatLakebaseTtl behavior, registry defaults, trailing-slash trim,
  and the updated closed-shape assertions

Suite: 419 → 425 (+6), 0 regressions.

Co-authored-by: Isaac
…paces

Closes the loop on the workspace-TTL parameterization: the previous
commit exposed LAKEBASE_KIT_FEATURE_BRANCH_TTL_MS as the env-overridable
substrate value, but the live-test runner had no way to set it on
demand. tdd-synthesis on a workspace with a sub-30-day cap kept hitting
LakebaseBranchTtlTooLongError even though the substrate was now
typed-error-correct.

  --feature-ttl-days <n>   converts to ms (n * 86_400_000) and exports
                           LAKEBASE_KIT_FEATURE_BRANCH_TTL_MS for the run

Usage:
  scripts/run-all-live-tests.sh --profile <name> --feature-ttl-days 7

Validated at parse time: must be a positive integer. Default behavior
(flag absent) is unchanged — substrate's 30-day default applies.

Co-authored-by: Isaac
Cosmetic-only sweep over the substrate files added or modified during
the recent branch-id / TTL / KIT_TIMEOUTS / KIT_REGISTRIES work. No
logic or behavior change. Suite stays at 425 passed / 0 failed.

Files touched: 22 (scripts/lakebase/, scripts/tdd/, scripts/*.sh,
tests/bdd/). Pre-existing files outside this session's scope retain
their original punctuation.

Co-authored-by: Isaac
Sweep across the kit files not touched in the prior polish commit:
substrate sources, agent prompts, references, templates, skill docs,
+ config. Cosmetic only; suite stays at 425 passed / 0 failed.

39 files changed, 271 insertions / 271 deletions (1:1 character swap).

Co-authored-by: Isaac
Adopt a two-file env-config layering for the kit's env-overridable knobs:
.env.template.config holds the committed public-default values (mainline
package registries, PSA-convention branch TTLs); .env.local.config holds
local overrides (corp proxies, workspace-tighter TTL caps) and stays
gitignored. run-all-live-tests.sh sources the template first, then the
local override, so local values win.

Test contract for env-overridable defaults is split: the kit-config
defaults assertions skipIf the matching env var is set, and the
convention-branches tier helpers now assert forwarded TTLs derived from
KIT_TIMEOUTS instead of hardcoded day counts (the absolute-default
numbers stay covered by kit-config.test.ts).

Also fix a bash arithmetic gotcha in run-all-live-tests.sh where
86_400_000 (underscore separators) tripped "value too great for base" on
strict bash modes. Plain digits work everywhere.

Co-authored-by: Isaac
…nfig

scripts/run-all-live-tests.sh now provisions every prerequisite the
gated suites probe for, defaulting to enabled so a single invocation
exercises the full kit:
- LAKEBASE_TEST_PROJECT_PATH=projects/<id> exported (unlocks
  tdd-experiment-lifecycle's live describe).
- --no-migrate-tools flag (default ON) provisions .venv-live-tests/
  with alembic + sqlalchemy + psycopg2-binary, and downloads Flyway CLI
  from LAKEBASE_KIT_REGISTRY_MAVEN_CENTRAL into .tools-live-tests/.
  Idempotent across runs; opt out with --no-migrate-tools.
- --no-github-runner flag (default ON) sets LAKEBASE_TEST_E2E_GITHUB=1
  and probes gh CLI auth / GITHUB_TOKEN env, warning if neither resolves.

tests/bdd/kit-config.test.ts now loads the module under a controlled env
via vi.resetModules() + scoped scrub-and-restore. Every default-value
assertion runs in a guaranteed clean scope regardless of what the host
process inherits from .env.local.config; the env-override mechanic is
covered by new positive-case tests (override applied, non-numeric falls
back, non-positive falls back, trailing slash trimmed). 18 conditional
tests become 22 unconditional tests.

Co-authored-by: Isaac
…isleading PROJECT_PATH

The live describe previously gated on LAKEBASE_TEST_PROJECT_PATH and fed
that value to cutExperiment's `instance` arg. branch-utils' projectPath()
helper then prepended `projects/` again, producing the malformed URL
/postgres/projects/projects/<id>/branches.

The substrate's `instance: string` parameter is the bare project id; the
test now reads from LAKEBASE_TEST_INSTANCE (which the live driver already
exports as the bare id) for both the gate and the cutExperiment arg.
Drop the redundant LAKEBASE_TEST_PROJECT_PATH export from
run-all-live-tests.sh.

Verified on fevm-serverless-stable-ecparr: 477 tests pass, 0 skipped,
0 failed in 144s, including tdd-experiment-lifecycle's live describe.

Co-authored-by: Isaac
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant