feat: chart generation throughput#696
Conversation
Replace sticky progress bars with a bounded ANSI/asciichart throughput panel that plots records per second per generation column. Default progress_bar to enabled and add a local demo config plus screenshot for PR review. Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>
|
MkDocs preview: https://7f35b409.dd-docs-preview.pages.dev Fern preview: https://nvidia-preview-pr-696.docs.buildwithfern.com/nemo/datadesigner
|
Code Review: PR #696 — feat: chart generation throughputSummaryReplaces the existing sticky fill-bar progress display with a bounded ANSI throughput chart panel rendered via
FindingsCorrectness / Behavior
Design / Conventions
Tests
Performance
Security
VerdictApprove with minor changes. The rewrite is well-scoped, the rendering math is reasonable, and tests cover the new surface adequately. Before merge I'd ask for:
The remaining points (constant naming, label-escape hardening, lost wrap-test coverage) are nice-to-haves rather than blockers. |
Greptile SummaryThis PR replaces the old fill/disappear ANSI progress bars with a bounded throughput panel backed by
|
| Filename | Overview |
|---|---|
| packages/data-designer-engine/src/data_designer/engine/progress/terminal/throughput_panel.py | New 885-line ANSI panel: chart rendering, column/model tables, feedback marker overlay, redraw throttling, handler wrapping, TTY fallback. Well-structured with lock discipline; redraw is correctly throttled at 0.75s via _redraw_if_due. |
| packages/data-designer-engine/src/data_designer/engine/progress/reporter.py | New AsyncProgressReporter: owns per-column ProgressTracker instances, drives panel updates at DEFAULT_TTY_REPORT_INTERVAL (0.75s), subscribes to token/admission events, and filters by run_id. close() is idempotent. Subscriptions are correctly set up in log_start (single-threaded) before workers spawn. |
| packages/data-designer-engine/src/data_designer/engine/dataset_builders/async_scheduler.py | Swapped StickyProgressBar for TerminalThroughputPanel, added _cancel_requested Event + request_cancel(), added CancelledError → request_cancel() handling in dispatch loop, and added outer finally to always close the reporter. Context vars for column and cancel event set/reset correctly around task execution. |
| packages/data-designer-config/src/data_designer/config/run_config.py | display_tui replaces progress_bar as the canonical field (default True). Deprecated shim via model_validator, property getter/setter, and model_copy override; all emit DeprecationWarning. setdefault ensures display_tui wins when both are present. |
| packages/data-designer-engine/src/data_designer/engine/dataset_builders/dataset_builder.py | Replaced StickyProgressBar with TerminalThroughputPanel, added _await_async_scheduler_result helper for KeyboardInterrupt → cancel propagation, updated run_config.display_tui reads. |
| packages/data-designer-engine/src/data_designer/engine/progress/tracker.py | Moved from utils/ to progress/ package. Minor additions: panel integration (add_bar, update) and quiet mode. Thread-safe via Lock with correct double-acquisition pattern for log-interval checks. |
| packages/data-designer-engine/src/data_designer/engine/models/usage_events.py | New module-level pub/sub for TokenUsageEvent, with Lock-protected callback dict. Pattern mirrors the RequestAdmissionEvent subscriber in observability.py. |
| packages/data-designer-engine/src/data_designer/engine/models/facade.py | _record_usage now allows output-only token events (input_tokens can be None), emits TokenUsageEvent after recording stats, and resolves column from context var or correlation. |
| packages/data-designer-engine/src/data_designer/engine/column_generators/generators/custom.py | _AsyncBridgedModelFacade now checks is_run_cancellation_requested() before submitting work and propagates column/cancel context vars into the async coroutine; handles CancelledError and interpreter-shutdown RuntimeError on scheduling. |
| packages/data-designer/src/data_designer/cli/commands/create.py | Adds --tui/--no-tui flag that overrides display_tui for a single run by calling model_copy on the active run config before generation starts. |
| packages/data-designer-engine/tests/engine/progress/terminal/test_throughput_panel.py | 553-line test suite with FakeTTY fixture, covering add/update/remove_bar, chart/legend rendering, feedback markers, model table, log interleaving, reporter integration, run-id filtering, and TTY fallback. |
Sequence Diagram
sequenceDiagram
participant CLI as CLI (create.py)
participant GC as GenerationController
participant DB as DatasetBuilder
participant AS as AsyncTaskScheduler
participant Panel as TerminalThroughputPanel
participant Reporter as AsyncProgressReporter
participant Worker as Task Workers
participant Facade as ModelFacade
CLI->>GC: "create(..., tui=True)"
GC->>GC: "run_config.model_copy(update={display_tui: tui})"
GC->>DB: create(builder, num_records)
DB->>AS: "AsyncTaskScheduler(display_tui=True)"
AS->>Panel: TerminalThroughputPanel()
AS->>Reporter: "AsyncProgressReporter(trackers, progress_bar=panel)"
DB->>AS: run() via thread future
AS->>Panel: __enter__() hide cursor, wrap log handlers
AS->>Reporter: log_start() _ensure_event_subscriptions()
Reporter-->>usage_events: subscribe_token_usage()
Reporter-->>observability: subscribe_request_admission_events()
loop dispatch loop
AS->>Worker: spawn task sets column/cancel context vars
Worker->>Facade: agenerate()
Facade-->>usage_events: emit_token_usage_event(TokenUsageEvent)
usage_events-->>Reporter: _record_token_usage()
Reporter->>Panel: record_model_usage()
Worker->>Reporter: record_success(column)
Reporter->>Reporter: _maybe_report() throttled 0.75s
Reporter->>Panel: update_many()
Panel->>Panel: _redraw_if_due() _format_panel() + asciichartpy.plot()
end
AS->>Reporter: log_final() close() unsubscribe
AS->>Panel: __exit__() show cursor, unwrap log handlers
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 2
packages/data-designer-engine/src/data_designer/engine/progress/reporter.py:51-54
The `add_bar` calls in `__init__` use the raw column key as both the key and the label, while `ProgressTracker` (in the sync fan-out path) registers bars with the richer `label` string (e.g. `"column 'foo'"`). In the async path the panel will show `"foo"` instead of `"column 'foo'"`. Using `tracker.label` here aligns the async panel display with the sync path and with the log output emitted by `log_start`.
```suggestion
if self._bar is not None:
for col, tracker in trackers.items():
self._bar.add_bar(col, tracker.label, tracker.total_records)
self._ensure_event_subscriptions()
```
### Issue 2 of 2
packages/data-designer-engine/src/data_designer/engine/progress/terminal/throughput_panel.py:70-74
When `_fit_ansi` truncates a string it strips all ANSI codes and slices the plain text, which correctly avoids leaving open escape sequences. However it also drops the trailing `_RESET` that callers such as `_format_progress_bar` always append. If a colored progress bar fills exactly `inner_width` visible characters and gets truncated, the open color code bleeds into the next rendered line. Adding an explicit reset at the end of the truncation path closes this edge case.
```suggestion
def _fit_ansi(text: str, width: int) -> str:
visible = _visible_len(text)
if visible > width:
return _ANSI_RE.sub("", text)[:width] + _RESET
return text + (" " * (width - visible))
```
Reviews (9): Last reviewed commit: "Merge branch 'main' into codex/progress-..." | Re-trigger Greptile
Throttle active TTY redraws, sample rates over larger windows, smooth and fit chart series, bound rate history, and harden panel tests. Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>
Render the progress legend as a stable table with live token-rate columns, attribute model usage to active generation columns across async bridge boundaries, and cancel the async scheduler cleanly on KeyboardInterrupt. Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>
Emit synthetic token usage in the credential-free progress panel demo so the live token-rate columns are visible, and accept output-only provider usage as a progress event. Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>
Replace the pipe-delimited progress legend with a native spaced layout, keep column alignment via computed widths, mute the header row, and refresh the PR screenshot. Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>
Show raw column names in the async progress panel legend because the table header already provides the column context, and refresh the PR screenshot. Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>
Label the progress legend's now and average record-rate columns as rec/s and refresh the screenshot. Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>
Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>
Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>
Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>
Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>
…hput-panel # Conflicts: # packages/data-designer-config/tests/config/test_run_config.py # uv.lock
Code Review — PR #696: feat: chart generation throughputSummaryThis PR replaces the old Footprint: 32 files, +2188/-668. Engine, config, and CLI packages all touched. Layering direction (interface → engine → config) is preserved. FindingsBehaviour / correctness
Style / conventions
TestingCoverage is strong:
Gaps / nice-to-haves:
Performance
SecurityNothing new exposed. ANSI sanitisation in VerdictLooks good to merge with minor follow-ups:
Architectural direction is sound: the new |
…hput-panel # Conflicts: # packages/data-designer-engine/src/data_designer/engine/dataset_builders/async_scheduler.py # packages/data-designer-engine/src/data_designer/engine/models/facade.py # packages/data-designer-engine/tests/engine/dataset_builders/test_async_builder_integration.py
|
Nice work on this one, @eric-tramel - the new progress package is thoughtfully separated and the compatibility path is in good shape. SummaryThis PR replaces the old sticky progress bars with a terminal throughput panel, wires model/request-admission events into the live display, makes FindingsWarnings - Worth addressing
Suggestions - Take it or leave it
What Looks Good
VerdictNeeds changes - please bound the panel height for wide configs and refresh the stale progress-bar docs before merge. The lockfile cleanup is optional but would make the dependency diff cleaner. This review was generated by an AI assistant. |
johnnygreco
left a comment
There was a problem hiding this comment.
This is really cool @eric-tramel!
I just have some nits related to more explicit labeling and logging. Would be nice if we could get "Preparing samplers to generate 50 records across 5 columns" to print before we start logging above the TUI. And can we have more explicit logs and/or labeling to make super clear that we are showing you llm/model-generated columns in the TUI? Of course, it is obvious when you think about it, but I could see users being confused mid-generation to see progress bars for some but not all columns.
The other nit is with all the labels at the top of graph, which shows the y-axis label (rec/s) plus the total stats. How flexible is the layout? Can we have just records/s on the left and then justify the total stats on the right or something like this? As is, it took me a second to understand "now 1.4" in │Throughput rec/s | 351.6s | 150/150 | now 1.4 | 0 failed, for example.
|
@mikeknep – heads up about this new tui |
|
Nice iteration on the follow-up commits. I re-reviewed latest head SummaryThis PR replaces the old sticky progress bars with the new FindingsWarnings - Worth addressing
What Looks Good
Validation
VerdictNeeds changes - please bound dense TUI panels to the terminal height and refresh the stale progress UI docs before merge. This review was generated by an AI assistant. |
Summary
This PR replaces the old fill/disappear terminal progress bars with a bounded ANSI/asciichart throughput panel that is enabled by default through the canonical
RunConfig.display_tui=Truesetting.RunConfig.progress_barremains supported as a deprecated compatibility shim with a warning, and the CLI exposesdata-designer create --tui / --no-tuias a per-run override.Usage
Default CLI behavior
The TUI is enabled by default for
createruns when the process is attached to a terminal. In non-TTY environments, such as redirected output or many notebook contexts, the renderer falls back instead of trying to draw the live panel.Force the TUI for one CLI run
Use
--tuiwhen the saved or programmatic run config disables the panel but this terminal run should show it.Disable the TUI for one CLI run
Use
--no-tuifor logs-only runs, CI jobs, captured terminal output, or any case where the live panel would be noisy.Configure from Python
display_tuiis the canonical run config option. It defaults toTrue; set it explicitly when creating datasets from a Python program.Disable from Python
Deprecated compatibility shim
Existing callers that still set
progress_barget the same behavior throughdisplay_tui, plus aDeprecationWarning.Changes
Added
data_designer.engine.progress.trackerforProgressTrackerdata_designer.engine.progress.reporterforAsyncProgressReporterdata_designer.engine.progress.terminal.throughput_panelforTerminalThroughputPanelRunConfig.display_tui, defaulting toTrue, as the canonical run config switch for the terminal throughput panel.RunConfig.progress_barcompatibility shim that maps todisplay_tuiand emitsDeprecationWarningon constructor use, property reads, and property writes.asciichartpyas an engine dependency for terminal chart rendering.rpm,in tok/s,out tok/s, and red feedback markers.data-designer create --tui / --no-tuito force the create command's TUI setting for a single CLI run.Changed
run_config.display_tui; production code no longer readsrun_config.progress_bar.now rec/s,avg rec/s, and completion progress; token rates are shown in the model-alias table.AsyncTaskScheduleracceptsdisplay_tuifor async panel setup and passes its run id intoAsyncProgressReporter, so global token/admission events from other runs do not pollute the active panel.packages/data-designer-engine/tests/engine/progress/.Removed
data_designer.engine.dataset_builders.utils..scratch/only.Attention Areas
throughput_panel.py- terminal rendering, chart/table layout, redraw throttling, feedback marker projection, and TTY fallback behavior.reporter.py- progress aggregation, token/admission event subscriptions, cleanup, and run-id filtering.async_scheduler.py- async progress setup, run correlation, cancellation behavior, anddisplay_tuiplumbing.create.py- CLI--tui / --no-tuioverride surface.run_config.py- canonicaldisplay_tuifield plus deprecatedprogress_barshim behavior.pyproject.toml- adds theasciichartpyruntime dependency.Validation
Focused config, engine progress/scheduler, CLI, and interface tests were run after merging the latest
origin/main; the branch is pushed and up to date.Description updated with AI