Context
When DATA_DESIGNER_ASYNC_ENGINE=1 is set, build() correctly routes through _build_async() and the AsyncTaskScheduler. However, build_preview() still goes through the sequential _run_batch() path - columns are processed one at a time, waiting for all records to complete before starting the next column.
For a recipe pipeline with 7 columns and 3 records, preview took ~52s sequentially. Independent columns (e.g., two recipe_idea columns on different providers) could run concurrently, and downstream columns could start as soon as their per-row dependencies are met.
Proposed approach
Reuse _build_async() with preview-specific behavior:
- Single row group, no disk checkpoints
- Return in-memory DataFrame instead of writing to disk
- Skip metadata writes
~50-100 lines, mostly conditional logic around checkpointing. The async scheduler itself needs no changes.
Related
Context
When
DATA_DESIGNER_ASYNC_ENGINE=1is set,build()correctly routes through_build_async()and theAsyncTaskScheduler. However,build_preview()still goes through the sequential_run_batch()path - columns are processed one at a time, waiting for all records to complete before starting the next column.For a recipe pipeline with 7 columns and 3 records, preview took ~52s sequentially. Independent columns (e.g., two
recipe_ideacolumns on different providers) could run concurrently, and downstream columns could start as soon as their per-row dependencies are met.Proposed approach
Reuse
_build_async()with preview-specific behavior:~50-100 lines, mostly conditional logic around checkpointing. The async scheduler itself needs no changes.
Related