feat: check_models external readiness check#712
Conversation
Signed-off-by: Mike Knepper <mknepper@nvidia.com>
Signed-off-by: Mike Knepper <mknepper@nvidia.com>
Signed-off-by: Mike Knepper <mknepper@nvidia.com>
Greptile SummaryThis PR introduces a
|
| Filename | Overview |
|---|---|
| packages/data-designer-engine/src/data_designer/engine/readiness.py | New module hosting the shared readiness-check logic lifted from DatasetBuilder; logic is functionally equivalent to what was removed, with correct async/sync dispatch and MCP column-type filter. |
| packages/data-designer-engine/src/data_designer/engine/flags.py | New module centralising DATA_DESIGNER_ASYNC_ENGINE; clean single-source-of-truth, evaluated at import time, intended to be monkeypatched in tests. |
| packages/data-designer-engine/src/data_designer/engine/dataset_builders/dataset_builder.py | Removes the two run*_if_needed instance methods, delegates to run_readiness_check, and switches all flag reads to flags.DATA_DESIGNER_ASYNC_ENGINE. Behavior is unchanged. |
| packages/data-designer/src/data_designer/interface/data_designer.py | Adds check_models method following the same shape as validate(); delegates to run_readiness_check after constructing a ResourceProvider. |
| packages/data-designer/src/data_designer/cli/controllers/generation_controller.py | Adds run_check_models with typed-error handling that surfaces the class name for DataDesignerError subclasses, and a generic fallback for other exceptions. |
| packages/data-designer/src/data_designer/cli/commands/check_models.py | New CLI command delegating to GenerationController.run_check_models; mirrors the validate command structure. |
| packages/data-designer-engine/tests/engine/test_readiness.py | Comprehensive test coverage: model alias collection, MCP alias deduplication/sorting, ordering guarantee, async dispatch, timeout/cancel path, and column-type coverage. |
Sequence Diagram
sequenceDiagram
participant User
participant RC as GenerationController
participant DD as DataDesigner
participant RD as readiness.run_readiness_check
participant MR as ModelRegistry
participant MCP as MCPRegistry
Note over User,MCP: New check-models path
User->>RC: check-models config.yaml
RC->>DD: check_models(config_builder)
DD->>RD: run_readiness_check(columns, resource_provider)
RD->>MR: run_health_check(model_aliases)
MR-->>RD: ok or typed error
RD->>MCP: run_health_check(tool_aliases)
MCP-->>RD: ok or RuntimeError
RD-->>DD: None
DD-->>RC: None
RC-->>User: All models and tools responded successfully
Note over User,MCP: Existing workload startup (unchanged)
User->>RC: create config.yaml
RC->>DD: create(config_builder)
DD->>RD: run_readiness_check(columns, resource_provider)
RD->>MR: run_health_check
RD->>MCP: run_health_check
RD-->>DD: None
DD->>DD: proceed with workload
Reviews (1): Last reviewed commit: "Lint fix" | Re-trigger Greptile
📋 Summary
Introduces a new
check_modelsCLI command andDataDesignerinterface method for checking external readiness of models and tools without triggering a full workload (preview or create). This is the "external deps" analogue to the existingvalidatefunctionality (internal coherence).🔗 Related Issue
N/A, direct to PR
🔄 Changes
DatasetBuilderto a standalonereadiness.pyengine module. Used by both the builder and the end user-facing interfacesflags.pyengine module where the async engine flag is centralized. This cleans up some duplication of the env var magic string / constant that was floating around in a few places.🧪 Testing
make testpasses✅ Checklist