Skip to content

Redesign simulation data model around cases and executions#107

Merged
tomvothecoder merged 42 commits intomainfrom
copilot/formalize-performance-archive-semantics
Mar 5, 2026
Merged

Redesign simulation data model around cases and executions#107
tomvothecoder merged 42 commits intomainfrom
copilot/formalize-performance-archive-semantics

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Feb 24, 2026

Description

Introduces a Case model to group simulations by case name (from timing files), replaces per-case canonical folding with per-execution Simulation records, and restructures the API so Case is the aggregate root. Each archive subdirectory (<jobID>.<timestamp>) becomes its own Simulation row linked to a Case via case_id. The first successful run per case is the canonical baseline; subsequent runs store a config diff dict against it.

Data Model

  • New Case table: name (unique), case_group (nullable, indexed — from CASE_GROUP in env_case.xml), canonical_simulation_id FK
  • Simulation changes: adds case_id FK, execution_id (unique), run_config_deltas (JSONB dict); drops case_name, name, group_name
  • CASE_HASH evaluated but intentionally not used — case grouping is directory-name based

Ingestion

  • Every successful run creates its own Simulation; first per case → canonical (run_config_deltas = NULL)
  • Non-canonical runs store dict[field, {"canonical": ..., "current": ...}] deltas
  • Deduplication by execution_id; incomplete runs skipped with warning
  • _get_or_create_case() resolves by case_name; CASE_GROUP stored on Case with conflict-safe upsert

API

  • GET /cases, GET /cases/{id}CaseOut with nested SimulationSummaryOut[] via selectinload
  • GET /simulations → flat browser with Case metadata joined (case_name, case_group)
  • GET /simulations/{id} → full SimulationOut with derived is_canonical, change_count
  • SimulationCreate/SimulationOut drop name, groupName; use caseId + executionId

Frontend

  • Case name/group filter dropdowns on SimulationsPage
  • All name/groupName references replaced with caseName/executionId/caseGroup
  • Upload form uses caseId + executionId

Config delta format

{
  "compiler": { "canonical": "gcc-11", "current": "gcc-12" },
  "grid_resolution": { "canonical": "0.9x1.25", "current": "1.9x2.5" }
}

Checklist

  • Code follows project style guidelines
  • Self-reviewed code
  • No new warnings
  • Tests added or updated (if needed)
  • All tests pass (locally and CI/CD)
  • Documentation/comments updated (if needed)
  • Breaking change noted (if applicable)

Breaking changes: SimulationCreate replaces caseName with caseId (UUID) + executionId (string), drops name/groupName. SimulationOut adds caseName, caseGroup, isCanonical, changeCount; drops name/groupName. CaseOut includes caseGroup and nested simulations: SimulationSummaryOut[]. run_config_deltas type changes from list[dict] to dict | None. env_case.xml required for ingestion.

Deployment Notes (if any)

Single squashed Alembic migration 20260304_400000 (revises 20260219_000000):

  • Creates cases table, backfills from existing case_name values
  • Adds case_id, execution_id to simulations; derives execution_id from UUID for existing rows
  • Sets canonical per case (earliest simulation)
  • Converts legacy list-format run_config_deltas to dict
  • Backfills case_group from existing group_name
  • Drops case_name, name, group_name columns and old constraints

Full downgrade is not allowed as it involves destructive changes to data to work.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • astral.sh
    • Triggering command: /usr/bin/curl curl -LsSf REDACTED (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Original prompt

This section details on the original issue you should resolve

<issue_title>[Enhancement]: Formalize performance_archive semantics and implement canonical run ingestion</issue_title>
<issue_description>## Is your feature request related to a problem?

There is ambiguity in how directories under performance_archive and case subdirectories (<jobID>.<timestamp>) should be interpreted.

This impacts:

  • SimBoard’s data model (Case vs. Run)
  • Canonical run selection
  • Incremental update behavior
  • Ingestion logic (e.g., main_parser())

Current assumptions may incorrectly treat all archive entries as completed production runs or treat job subdirectories as independent experiments rather than successive executions of the same case.

Describe the solution you'd like

Document and formalize conventions for how performance_archive should be interpreted and ingested, and update the data model and ingestion logic accordingly.

1. performance_archive Semantics

  • The archive can include incomplete or failed runs.
  • Some directories may contain build provenance only, with no run provenance (e.g., job never executed or crashed).
  • Ingestion must not assume all entries are completed production runs.
  • A run is considered successful only if all required run metadata is present.
  • Incomplete runs should be detected and either skipped or classified explicitly.

2. Case Directory Structure

Example:

ac.golaz/v3.LR.historical_0121
├── 1081156.251218-200923
├── 1081290.251218-211543
└── 1129347.260120-133246

Subdirectories (<jobID>.<timestamp>)

  • These represent multiple executions of the same case, not separate experiments.
  • Long simulations (e.g., 100-year runs) require multiple job submissions.
  • Each submission appears separately in performance_archive.
  • Timing may vary across runs due to machine load.
  • All subdirectories are logically grouped under the same casename.

3. Canonical Run Strategy

Recommended ingestion semantics:

  • Treat the casename (experiment name) as the primary entity.
  • Identify and store the first successful run as the canonical baseline.
    • “Successful” = all required metadata fields are present.
  • For subsequent successful runs:
    • Compare against the canonical baseline.
    • Record only configuration differences (e.g., changed namelist values).
    • Do not ingest timing data (handled by PACE).
  • Expect certain values to differ across runs (e.g., total days run, starting day).

4. Handling New Runs (Incremental Updates)

When a new <jobID>.<timestamp> directory appears under an existing casename:

  • Do not replace existing data.
  • Read the new run.
  • If successful:
    • Compare it to the stored canonical baseline.
    • Record any configuration differences.
  • Maintain grouping under the casename.

Runs are successive executions of the same case whose configuration deltas are tracked over time.

Required Changes

Data Model

  • Explicitly model Case (casename) as the primary entity.
  • Model Run as a child entity representing individual executions.
  • Preserve the existing unique constraint on simulations:
    • (case_name, machine_id, simulation_start_date)
  • Ensure ingestion respects this constraint to maintain idempotency.

Ingestion (main_parser())

Update logic to:

  • Detect and classify incomplete/failed runs.
  • Group <jobID>.<timestamp> directories under a single casename.
  • Identify the first successful run (all required metadata present) as canonical.
  • Compare subsequent runs against the canonical baseline.
  • Support idempotent ingestion using the existing uniqueness constraint.
  • Support incremental updates without overwriting prior data.

Initial DB Population Rules

Define and document whether:

  • All successful runs are stored (baseline + deltas), or
  • Only canonical baseline runs are fully stored, with others stored as diffs.

Deliverables

  • Update data model definitions (if needed).
  • Modify main_parser() to implement the semantics described above.
  • Add or update tests covering:
    • Incomplete runs
    • Multiple runs under one casename
    • Canonical selection behavior
    • Incremental ingestion
  • Update relevant documentation.

Describe alternatives you've considered

No response

Additional context

Follow up to:
#88 (comment)

Rob answer:
#88 (comment)</issue_description>

<agent_instructions>You are implementing the semantics defined in this issue.

Scope constraints:

  • Do not introduce unrelated architectural refactors.
  • Limit changes to:
    • Data model (if required)
    • main_parser() and directly related ingestion utilities
    • Tests
    • Documentation updates tied to this behavior

Implementation requirements:

  1. Canonical Selection
    • A run is "successful" only if all required metadata fields are pre...

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI and others added 2 commits February 24, 2026 22:19
- main_parser: skip incomplete runs gracefully, sort dirs
  deterministically for reproducible canonical selection
- ingest_archive: group runs by case_name, ingest first
  successful run as canonical, store config deltas for
  subsequent runs in extra JSONB field
- Add _compute_config_delta and _attach_config_delta helpers
- Add skipped_count to IngestArchiveResult
- Update test for new incomplete-run skip behavior

Co-authored-by: tomvothecoder <25624127+tomvothecoder@users.noreply.github.com>
idempotent/incremental ingestion

Tests cover:
- Incomplete runs skipped alongside valid runs
- Multiple runs under same casename (performance_archive)
- Deterministic sort order for canonical selection
- Config delta stored for non-canonical runs
- No delta when configs are identical
- Different case_names create separate simulations
- Idempotent re-ingestion
- Incremental ingestion of new runs

Co-authored-by: tomvothecoder <25624127+tomvothecoder@users.noreply.github.com>
Copilot AI changed the title [WIP] Enhance performance_archive semantics and ingestion logic Formalize performance_archive semantics and implement canonical run ingestion Feb 24, 2026
Copilot AI requested a review from tomvothecoder February 24, 2026 22:28
Copilot stopped work on behalf of tomvothecoder due to an error February 24, 2026 22:39
@tomvothecoder

This comment was marked as outdated.

tomvothecoder and others added 3 commits March 3, 2026 10:11
- Add run_config_deltas column to SQLAlchemy Simulation model
- Add field to SimulationCreate and SimulationOut schemas
- Create Alembic migration with data migration from extra
- Update _attach_config_delta to write to new column
- Update tests to assert on new column instead of extra
- Remove all legacy extra["run_config_deltas"] references

Co-authored-by: tomvothecoder <25624127+tomvothecoder@users.noreply.github.com>
- Change type from dict[str, Any] | None to list[dict[str, Any]] | None
  to match actual list-append usage in _attach_config_delta
- Improve field description to clarify that it stores deltas between
  canonical and non-canonical runs, not just canonical simulations

Co-authored-by: tomvothecoder <25624127+tomvothecoder@users.noreply.github.com>

This comment was marked as outdated.

@tomvothecoder
Copy link
Copy Markdown
Collaborator

@copilot Fix recent CI/CD issues

@tomvothecoder

This comment was marked as outdated.

This comment was marked as outdated.

Copilot AI and others added 3 commits March 3, 2026 22:50
- Import Case, _derive_execution_id, _get_or_create_case
- Replace case_name on Simulation with case_id + execution_id
- Use path-like exp_dir keys for _derive_execution_id
- Update canonical run tests: every run gets its own SimulationCreate
- Change run_config_deltas from list[dict] to dict
- Deduplication now by execution_id instead of composite key
- Update assertions for new model structure

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Replace caseName with caseId (UUID) and executionId (string) in all
  SimulationCreate mock objects
- Import Case model and uuid4
- Create Case records in DB before each test that persists simulations
- Replace Simulation.case_name queries with Simulation.case_id queries
  using pre-created Case objects

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

This comment was marked as resolved.

@rljacob
Copy link
Copy Markdown
Member

rljacob commented Mar 5, 2026

No the CASE_HASH should be the same in different submissions. Where are those simulation directories?

@tomvothecoder
Copy link
Copy Markdown
Collaborator

No the CASE_HASH should be the same in different submissions. Where are those simulation directories?

/lcrc/group/e3sm/PERF_Chrysalis/performance_archive/ac.golaz/v3.LR.historical_0121

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 31 out of 32 changed files in this pull request and generated 4 comments.

Comment thread backend/app/features/ingestion/ingest.py Outdated
Comment thread backend/app/features/ingestion/parsers/parser.py Outdated
Comment thread backend/app/features/ingestion/parsers/parser.py Outdated
@tomvothecoder tomvothecoder requested a review from Copilot March 5, 2026 22:45
@tomvothecoder tomvothecoder changed the title Refactor: Case-based grouping, canonical baselines, per-execution simulations, and CASE_GROUP normalization Redesign simulation data model around cases and executions Mar 5, 2026
@tomvothecoder tomvothecoder merged commit 040d00e into main Mar 5, 2026
1 check failed
@tomvothecoder tomvothecoder deleted the copilot/formalize-performance-archive-semantics branch March 5, 2026 23:00
Copilot AI review requested due to automatic review settings March 23, 2026 23:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Enhancement]: Formalize performance_archive semantics and implement canonical run ingestion

4 participants