Redesign simulation data model around cases and executions#107
Merged
tomvothecoder merged 42 commits intomainfrom Mar 5, 2026
Merged
Redesign simulation data model around cases and executions#107tomvothecoder merged 42 commits intomainfrom
tomvothecoder merged 42 commits intomainfrom
Conversation
- main_parser: skip incomplete runs gracefully, sort dirs deterministically for reproducible canonical selection - ingest_archive: group runs by case_name, ingest first successful run as canonical, store config deltas for subsequent runs in extra JSONB field - Add _compute_config_delta and _attach_config_delta helpers - Add skipped_count to IngestArchiveResult - Update test for new incomplete-run skip behavior Co-authored-by: tomvothecoder <25624127+tomvothecoder@users.noreply.github.com>
idempotent/incremental ingestion Tests cover: - Incomplete runs skipped alongside valid runs - Multiple runs under same casename (performance_archive) - Deterministic sort order for canonical selection - Config delta stored for non-canonical runs - No delta when configs are identical - Different case_names create separate simulations - Idempotent re-ingestion - Incremental ingestion of new runs Co-authored-by: tomvothecoder <25624127+tomvothecoder@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Enhance performance_archive semantics and ingestion logic
Formalize performance_archive semantics and implement canonical run ingestion
Feb 24, 2026
Copilot stopped work on behalf of
tomvothecoder due to an error
February 24, 2026 22:39
This comment was marked as outdated.
This comment was marked as outdated.
- Add run_config_deltas column to SQLAlchemy Simulation model - Add field to SimulationCreate and SimulationOut schemas - Create Alembic migration with data migration from extra - Update _attach_config_delta to write to new column - Update tests to assert on new column instead of extra - Remove all legacy extra["run_config_deltas"] references Co-authored-by: tomvothecoder <25624127+tomvothecoder@users.noreply.github.com>
- Change type from dict[str, Any] | None to list[dict[str, Any]] | None to match actual list-append usage in _attach_config_delta - Improve field description to clarify that it stores deltas between canonical and non-canonical runs, not just canonical simulations Co-authored-by: tomvothecoder <25624127+tomvothecoder@users.noreply.github.com>
This comment was marked as outdated.
This comment was marked as outdated.
Collaborator
|
@copilot Fix recent CI/CD issues |
Copilot stopped work on behalf of
tomvothecoder due to an error
March 3, 2026 22:25
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
- Import Case, _derive_execution_id, _get_or_create_case - Replace case_name on Simulation with case_id + execution_id - Use path-like exp_dir keys for _derive_execution_id - Update canonical run tests: every run gets its own SimulationCreate - Change run_config_deltas from list[dict] to dict - Deduplication now by execution_id instead of composite key - Update assertions for new model structure Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Replace caseName with caseId (UUID) and executionId (string) in all SimulationCreate mock objects - Import Case model and uuid4 - Create Case records in DB before each test that persists simulations - Replace Simulation.case_name queries with Simulation.case_id queries using pre-created Case objects Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Member
|
No the CASE_HASH should be the same in different submissions. Where are those simulation directories? |
Collaborator
|
- Fix UUID and PG_UUID mypy type annotation issues
This was referenced Mar 5, 2026
tomvothecoder
approved these changes
Mar 5, 2026
This was referenced Mar 10, 2026
7 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Introduces a
Casemodel to group simulations by case name (from timing files), replaces per-case canonical folding with per-executionSimulationrecords, and restructures the API soCaseis the aggregate root. Each archive subdirectory (<jobID>.<timestamp>) becomes its ownSimulationrow linked to aCaseviacase_id. The first successful run per case is the canonical baseline; subsequent runs store a config diff dict against it.Data Model
Casetable:name(unique),case_group(nullable, indexed — fromCASE_GROUPinenv_case.xml),canonical_simulation_idFKSimulationchanges: addscase_idFK,execution_id(unique),run_config_deltas(JSONB dict); dropscase_name,name,group_nameCASE_HASHevaluated but intentionally not used — case grouping is directory-name basedIngestion
Simulation; first per case → canonical (run_config_deltas = NULL)dict[field, {"canonical": ..., "current": ...}]deltasexecution_id; incomplete runs skipped with warning_get_or_create_case()resolves bycase_name;CASE_GROUPstored onCasewith conflict-safe upsertAPI
GET /cases,GET /cases/{id}→CaseOutwith nestedSimulationSummaryOut[]viaselectinloadGET /simulations→ flat browser with Case metadata joined (case_name,case_group)GET /simulations/{id}→ fullSimulationOutwith derivedis_canonical,change_countSimulationCreate/SimulationOutdropname,groupName; usecaseId+executionIdFrontend
name/groupNamereferences replaced withcaseName/executionId/caseGroupcaseId+executionIdConfig delta format
{ "compiler": { "canonical": "gcc-11", "current": "gcc-12" }, "grid_resolution": { "canonical": "0.9x1.25", "current": "1.9x2.5" } }Checklist
Breaking changes:
SimulationCreatereplacescaseNamewithcaseId(UUID) +executionId(string), dropsname/groupName.SimulationOutaddscaseName,caseGroup,isCanonical,changeCount; dropsname/groupName.CaseOutincludescaseGroupand nestedsimulations: SimulationSummaryOut[].run_config_deltastype changes fromlist[dict]todict | None.env_case.xmlrequired for ingestion.Deployment Notes (if any)
Single squashed Alembic migration
20260304_400000(revises20260219_000000):casestable, backfills from existingcase_namevaluescase_id,execution_idtosimulations; derives execution_id from UUID for existing rowsrun_config_deltasto dictcase_groupfrom existinggroup_namecase_name,name,group_namecolumns and old constraintsFull downgrade is not allowed as it involves destructive changes to data to work.
Warning
Firewall rules blocked me from connecting to one or more addresses (expand for details)
I tried to connect to the following addresses, but was blocked by firewall rules:
astral.sh/usr/bin/curl curl -LsSf REDACTED(dns block)If you need me to access, download, or install something from one of these locations, you can either:
Original prompt
This section details on the original issue you should resolve
<issue_title>[Enhancement]: Formalize
performance_archivesemantics and implement canonical run ingestion</issue_title><issue_description>## Is your feature request related to a problem?
There is ambiguity in how directories under
performance_archiveand case subdirectories (<jobID>.<timestamp>) should be interpreted.This impacts:
main_parser())Current assumptions may incorrectly treat all archive entries as completed production runs or treat job subdirectories as independent experiments rather than successive executions of the same case.
Describe the solution you'd like
Document and formalize conventions for how
performance_archiveshould be interpreted and ingested, and update the data model and ingestion logic accordingly.1.
performance_archiveSemantics2. Case Directory Structure
Example:
Subdirectories (
<jobID>.<timestamp>)performance_archive.3. Canonical Run Strategy
Recommended ingestion semantics:
4. Handling New Runs (Incremental Updates)
When a new
<jobID>.<timestamp>directory appears under an existing casename:Runs are successive executions of the same case whose configuration deltas are tracked over time.
Required Changes
Data Model
Case(casename) as the primary entity.Runas a child entity representing individual executions.case_name,machine_id,simulation_start_date)Ingestion (
main_parser())Update logic to:
<jobID>.<timestamp>directories under a single casename.Initial DB Population Rules
Define and document whether:
Deliverables
main_parser()to implement the semantics described above.Describe alternatives you've considered
No response
Additional context
Follow up to:
#88 (comment)
Rob answer:
#88 (comment)</issue_description>
<agent_instructions>You are implementing the semantics defined in this issue.
Scope constraints:
main_parser()and directly related ingestion utilitiesImplementation requirements:
performance_archivesemantics and implement canonical run ingestion #104💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.