From 6d7f39f6e3e5af5bd27edba9ad66def1c2ebbbc8 Mon Sep 17 00:00:00 2001
From: Johnny Greco <jogreco@nvidia.com>
Date: Thu, 19 Mar 2026 17:17:52 -0700
Subject: [PATCH 01/12] feat: add preview review reference for Data Designer
 skill

---
 .../references/preview-review.md              | 29 +++++++++++++++++++
 1 file changed, 29 insertions(+)
 create mode 100644 skills/data-designer/references/preview-review.md
diff --git a/skills/data-designer/references/preview-review.md b/skills/data-designer/references/preview-review.md
new file mode 100644
index 00000000..19aac3af
--- /dev/null
+++ b/skills/data-designer/references/preview-review.md
@@ -0,0 +1,29 @@
+# Preview Review Guide
+
+## Mindset
+
+Quality is statistical, not per-record. Fix systemic issues that affect many records; don't chase cosmetic flaws in individual ones. But don't stop early — clear patterns of broken data or ignored instructions are worth fixing.
+
+## Reading Sample Records
+
+Load `dataset.parquet` from the preview results directory (printed as `Results path:` by the preview command, or the most recent `artifacts/preview_results_*/` directory). Use pandas to load the parquet file and print the records in a compact, reviewable format.
+
+## What to Look For
+
+The specifics depend on the dataset and its intended use. The categories below are common starting points — adapt based on what matters for this dataset.
+
+### Diversity
+- **Mode collapse**: are records clustering around the same patterns, topics, or phrasings?
+- **Sampler effectiveness**: are samplers being used effectively to steer diversity in the dataset?
+- **Structural monotony**: do LLM-generated columns follow the same template across records?
+
+### Data Quality
+- **Instruction compliance**: does generated content follow prompt constraints (step counts, format requirements, allowed values)?
+- **Internal consistency**: does data within a record agree with itself?
+- **Encoding integrity**: no garbled encoding, mojibake, or broken unicode.
+- **Plausibility**: do examples look like they could come from the real domain, or are they obviously synthetic?
+
+### Design Choices
+- **Column types**: if a text column consistently produces structured data or code, use the appropriate specialized column type. If values come from a fixed set or known distribution, use a sampler instead of an LLM column.
+- **Validation**: if output could be checked programmatically (syntax, schema conformance, value ranges), attach a validator.
+- **Judge calibration** (if applicable): are scores consistent across similar-quality records? Does the judge catch visible problems? Consider the user's intent — uniformly high scores may be correct if the judge is a quality filter; a spread matters more if it's a training signal.

From 95ff92bdd79268476b039004dcfb934fa8815f99 Mon Sep 17 00:00:00 2001
From: Johnny Greco <jogreco@nvidia.com>
Date: Thu, 19 Mar 2026 17:21:19 -0700
Subject: [PATCH 02/12] feat: add preview review offer to interactive workflow
 iterate step

---
 skills/data-designer/workflows/interactive.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/skills/data-designer/workflows/interactive.md b/skills/data-designer/workflows/interactive.md
index 81d22c94..5b1fa07e 100644
--- a/skills/data-designer/workflows/interactive.md
+++ b/skills/data-designer/workflows/interactive.md
@@ -23,7 +23,7 @@ This is an interactive, iterative design process. Do not disengage from the loop
 6. **Preview** — Run `data-designer preview <path> --save-results` to generate sample records as HTML files.
   - Note the sample records directory printed by the `data-designer preview` command
   - Give the user a clickable link: `file://<sample-records-dir>/sample_records_browser.html`
-7. **Iterate** — Ask the user for feedback. Edit the script, re-validate, re-preview, and serve again. Repeat until they are satisfied.
+7. **Iterate** — Ask the user for feedback and offer to review the records and suggest improvements yourself. If asked to review, read `references/preview-review.md`. Edit the script, re-validate, re-preview, and serve again. Repeat until they are satisfied.
 8. **Finalize** — Once the user is happy, tell them they can run the following command to create the dataset:
   - `data-designer create <path> --num-records <N> --dataset-name <name>`.
   - Warn the user that generation can take a long time for large record counts (50+).

From e5f7b17e4bfcd71460e35075f381aa85f162a720 Mon Sep 17 00:00:00 2001
From: Johnny Greco <jogreco@nvidia.com>
Date: Thu, 19 Mar 2026 17:22:22 -0700
Subject: [PATCH 03/12] fix: remove stale "and serve again" from iterate step

---
 skills/data-designer/workflows/interactive.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/skills/data-designer/workflows/interactive.md b/skills/data-designer/workflows/interactive.md
index 5b1fa07e..34ee764e 100644
--- a/skills/data-designer/workflows/interactive.md
+++ b/skills/data-designer/workflows/interactive.md
@@ -23,7 +23,7 @@ This is an interactive, iterative design process. Do not disengage from the loop
 6. **Preview** — Run `data-designer preview <path> --save-results` to generate sample records as HTML files.
   - Note the sample records directory printed by the `data-designer preview` command
   - Give the user a clickable link: `file://<sample-records-dir>/sample_records_browser.html`
-7. **Iterate** — Ask the user for feedback and offer to review the records and suggest improvements yourself. If asked to review, read `references/preview-review.md`. Edit the script, re-validate, re-preview, and serve again. Repeat until they are satisfied.
+7. **Iterate** — Ask the user for feedback and offer to review the records and suggest improvements yourself. If asked to review, read `references/preview-review.md`. Edit the script, re-validate, and re-preview. Repeat until they are satisfied.
 8. **Finalize** — Once the user is happy, tell them they can run the following command to create the dataset:
   - `data-designer create <path> --num-records <N> --dataset-name <name>`.
   - Warn the user that generation can take a long time for large record counts (50+).

From a897cb7a3d30ea48f176cc6c0db0f6f30ff1d4a6 Mon Sep 17 00:00:00 2001
From: Johnny Greco <jogreco@nvidia.com>
Date: Thu, 19 Mar 2026 17:25:45 -0700
Subject: [PATCH 04/12] fix: reframe design choices as general feature-fit
 guidance

---
 skills/data-designer/references/preview-review.md | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/skills/data-designer/references/preview-review.md b/skills/data-designer/references/preview-review.md
index 19aac3af..d4c309a4 100644
--- a/skills/data-designer/references/preview-review.md
+++ b/skills/data-designer/references/preview-review.md
@@ -24,6 +24,7 @@ The specifics depend on the dataset and its intended use. The categories below a
 - **Plausibility**: do examples look like they could come from the real domain, or are they obviously synthetic?
 
 ### Design Choices
-- **Column types**: if a text column consistently produces structured data or code, use the appropriate specialized column type. If values come from a fixed set or known distribution, use a sampler instead of an LLM column.
-- **Validation**: if output could be checked programmatically (syntax, schema conformance, value ranges), attach a validator.
-- **Judge calibration** (if applicable): are scores consistent across similar-quality records? Does the judge catch visible problems? Consider the user's intent — uniformly high scores may be correct if the judge is a quality filter; a spread matters more if it's a training signal.
+Are the right Data Designer features being used? For example:
+- A text column that consistently produces structured data or code might be better as a specialized column type.
+- Values drawn from a fixed set or known distribution could use a sampler instead of an LLM column.
+- If the dataset has judge columns, check whether scores are consistent across similar-quality records and whether the judge catches visible problems.

From b4bfdb41e4fd2b6b8c37e9c446af39a42db944c5 Mon Sep 17 00:00:00 2001
From: Johnny Greco <jogreco@nvidia.com>
Date: Thu, 19 Mar 2026 17:26:47 -0700
Subject: [PATCH 05/12] fix: move judge calibration from design choices to data
 quality

---
 skills/data-designer/references/preview-review.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/skills/data-designer/references/preview-review.md b/skills/data-designer/references/preview-review.md
index d4c309a4..479d687b 100644
--- a/skills/data-designer/references/preview-review.md
+++ b/skills/data-designer/references/preview-review.md
@@ -22,9 +22,9 @@ The specifics depend on the dataset and its intended use. The categories below a
 - **Internal consistency**: does data within a record agree with itself?
 - **Encoding integrity**: no garbled encoding, mojibake, or broken unicode.
 - **Plausibility**: do examples look like they could come from the real domain, or are they obviously synthetic?
+- **Judge calibration** (if applicable): are scores consistent across similar-quality records? Does the judge catch visible problems?
 
 ### Design Choices
 Are the right Data Designer features being used? For example:
 - A text column that consistently produces structured data or code might be better as a specialized column type.
 - Values drawn from a fixed set or known distribution could use a sampler instead of an LLM column.
-- If the dataset has judge columns, check whether scores are consistent across similar-quality records and whether the judge catches visible problems.

From 0c916c134a2edaf20a44bbffb0902c12aafde446 Mon Sep 17 00:00:00 2001
From: Johnny Greco <jogreco@nvidia.com>
Date: Thu, 19 Mar 2026 17:39:46 -0700
Subject: [PATCH 06/12] fix: make review offer more prominent in iterate step

---
 skills/data-designer/workflows/interactive.md | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/skills/data-designer/workflows/interactive.md b/skills/data-designer/workflows/interactive.md
index 34ee764e..d5046e36 100644
--- a/skills/data-designer/workflows/interactive.md
+++ b/skills/data-designer/workflows/interactive.md
@@ -23,7 +23,10 @@ This is an interactive, iterative design process. Do not disengage from the loop
 6. **Preview** — Run `data-designer preview <path> --save-results` to generate sample records as HTML files.
   - Note the sample records directory printed by the `data-designer preview` command
   - Give the user a clickable link: `file://<sample-records-dir>/sample_records_browser.html`
-7. **Iterate** — Ask the user for feedback and offer to review the records and suggest improvements yourself. If asked to review, read `references/preview-review.md`. Edit the script, re-validate, and re-preview. Repeat until they are satisfied.
+7. **Iterate**
+   - Ask the user for feedback.
+   - Offer to review the records yourself and suggest improvements. If the user accepts, read `references/preview-review.md` for guidance.
+   - Apply changes, re-validate, and re-preview. Repeat until they are satisfied.
 8. **Finalize** — Once the user is happy, tell them they can run the following command to create the dataset:
   - `data-designer create <path> --num-records <N> --dataset-name <name>`.
   - Warn the user that generation can take a long time for large record counts (50+).

From bcc94c1b8b777d37924aa03478883e89ebc02048 Mon Sep 17 00:00:00 2001
From: Johnny Greco <jogreco@nvidia.com>
Date: Thu, 19 Mar 2026 17:39:56 -0700
Subject: [PATCH 07/12] fix: clarify "the user" in iterate step

---
 skills/data-designer/workflows/interactive.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/skills/data-designer/workflows/interactive.md b/skills/data-designer/workflows/interactive.md
index d5046e36..dea6aa4e 100644
--- a/skills/data-designer/workflows/interactive.md
+++ b/skills/data-designer/workflows/interactive.md
@@ -26,7 +26,7 @@ This is an interactive, iterative design process. Do not disengage from the loop
 7. **Iterate**
    - Ask the user for feedback.
    - Offer to review the records yourself and suggest improvements. If the user accepts, read `references/preview-review.md` for guidance.
-   - Apply changes, re-validate, and re-preview. Repeat until they are satisfied.
+   - Apply changes, re-validate, and re-preview. Repeat until the user is satisfied.
 8. **Finalize** — Once the user is happy, tell them they can run the following command to create the dataset:
   - `data-designer create <path> --num-records <N> --dataset-name <name>`.
   - Warn the user that generation can take a long time for large record counts (50+).

From 205e64728e026bd756eb9aceb694671256704840 Mon Sep 17 00:00:00 2001
From: Johnny Greco <jogreco@nvidia.com>
Date: Thu, 19 Mar 2026 20:31:52 -0700
Subject: [PATCH 08/12] fix: generalize generation time warning across
 workflows

---
 skills/data-designer/workflows/autopilot.md   | 4 ++--
 skills/data-designer/workflows/interactive.md | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/skills/data-designer/workflows/autopilot.md b/skills/data-designer/workflows/autopilot.md
index 4fd08489..2a61c19f 100644
--- a/skills/data-designer/workflows/autopilot.md
+++ b/skills/data-designer/workflows/autopilot.md
@@ -20,7 +20,7 @@ In this mode, make reasonable design decisions autonomously based on the dataset
   - Note the sample records directory printed by the `data-designer preview` command
   - Give the user a clickable link: `file://<sample-records-dir>/sample_records_browser.html`
 7. **Create** — If the user specified a record count:
-  - 50 or fewer: run `data-designer create <path> --num-records <N> --dataset-name <name>` directly.
-  - More than 50: warn that generation can take a long time and ask for confirmation before running.
+  - Run `data-designer create <path> --num-records <N> --dataset-name <name>`.
+  - Generation time depends on record count, number of LLM columns, and inference throughput. For larger datasets, warn the user and ask for confirmation before running.
   - If no record count was specified, skip this step.
 8. **Present** — Summarize what was built: columns, samplers used, key design choices. If the create command was run, share the results. Ask the user if they want any changes. If so, edit the script, re-validate, re-preview, and iterate.
diff --git a/skills/data-designer/workflows/interactive.md b/skills/data-designer/workflows/interactive.md
index dea6aa4e..e7cb6868 100644
--- a/skills/data-designer/workflows/interactive.md
+++ b/skills/data-designer/workflows/interactive.md
@@ -29,5 +29,5 @@ This is an interactive, iterative design process. Do not disengage from the loop
    - Apply changes, re-validate, and re-preview. Repeat until the user is satisfied.
 8. **Finalize** — Once the user is happy, tell them they can run the following command to create the dataset:
   - `data-designer create <path> --num-records <N> --dataset-name <name>`.
-  - Warn the user that generation can take a long time for large record counts (50+).
-  - Do not run this command yourself — it can take a long time for large datasets and the user should control when it runs.
+  - Note that generation time depends on record count, number of LLM columns, and inference throughput — it can range from seconds to hours.
+  - Do not run this command yourself — the user should control when it runs.

From b5922ddde61bf62b44c04e7cae405c3759ac9f73 Mon Sep 17 00:00:00 2001
From: Johnny Greco <jogreco@nvidia.com>
Date: Thu, 19 Mar 2026 20:33:12 -0700
Subject: [PATCH 09/12] fix: soften generation time warning wording

---
 skills/data-designer/workflows/autopilot.md   | 2 +-
 skills/data-designer/workflows/interactive.md | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/skills/data-designer/workflows/autopilot.md b/skills/data-designer/workflows/autopilot.md
index 2a61c19f..a101effc 100644
--- a/skills/data-designer/workflows/autopilot.md
+++ b/skills/data-designer/workflows/autopilot.md
@@ -21,6 +21,6 @@ In this mode, make reasonable design decisions autonomously based on the dataset
   - Give the user a clickable link: `file://<sample-records-dir>/sample_records_browser.html`
 7. **Create** — If the user specified a record count:
   - Run `data-designer create <path> --num-records <N> --dataset-name <name>`.
-  - Generation time depends on record count, number of LLM columns, and inference throughput. For larger datasets, warn the user and ask for confirmation before running.
+  - Generation time varies — it depends on factors like record count, number of LLM columns, and inference throughput. For larger datasets, warn the user and ask for confirmation before running.
   - If no record count was specified, skip this step.
 8. **Present** — Summarize what was built: columns, samplers used, key design choices. If the create command was run, share the results. Ask the user if they want any changes. If so, edit the script, re-validate, re-preview, and iterate.
diff --git a/skills/data-designer/workflows/interactive.md b/skills/data-designer/workflows/interactive.md
index e7cb6868..5a797a26 100644
--- a/skills/data-designer/workflows/interactive.md
+++ b/skills/data-designer/workflows/interactive.md
@@ -29,5 +29,5 @@ This is an interactive, iterative design process. Do not disengage from the loop
    - Apply changes, re-validate, and re-preview. Repeat until the user is satisfied.
 8. **Finalize** — Once the user is happy, tell them they can run the following command to create the dataset:
   - `data-designer create <path> --num-records <N> --dataset-name <name>`.
-  - Note that generation time depends on record count, number of LLM columns, and inference throughput — it can range from seconds to hours.
+  - Warn that generation time varies — it depends on factors like record count, number of LLM columns, and inference throughput.
   - Do not run this command yourself — the user should control when it runs.

From 834f822405d453fe2cdb483c322d2fe611cb143a Mon Sep 17 00:00:00 2001
From: Johnny Greco <jogreco@nvidia.com>
Date: Thu, 19 Mar 2026 20:37:00 -0700
Subject: [PATCH 10/12] fix: rephrase generation time warning

---
 skills/data-designer/workflows/autopilot.md   | 2 +-
 skills/data-designer/workflows/interactive.md | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/skills/data-designer/workflows/autopilot.md b/skills/data-designer/workflows/autopilot.md
index a101effc..2f13b7e7 100644
--- a/skills/data-designer/workflows/autopilot.md
+++ b/skills/data-designer/workflows/autopilot.md
@@ -21,6 +21,6 @@ In this mode, make reasonable design decisions autonomously based on the dataset
   - Give the user a clickable link: `file://<sample-records-dir>/sample_records_browser.html`
 7. **Create** — If the user specified a record count:
   - Run `data-designer create <path> --num-records <N> --dataset-name <name>`.
-  - Generation time varies — it depends on factors like record count, number of LLM columns, and inference throughput. For larger datasets, warn the user and ask for confirmation before running.
+  - Generation speed depends heavily on the dataset configuration and the user's inference setup. For larger datasets, warn the user and ask for confirmation before running.
   - If no record count was specified, skip this step.
 8. **Present** — Summarize what was built: columns, samplers used, key design choices. If the create command was run, share the results. Ask the user if they want any changes. If so, edit the script, re-validate, re-preview, and iterate.
diff --git a/skills/data-designer/workflows/interactive.md b/skills/data-designer/workflows/interactive.md
index 5a797a26..d4a4ab33 100644
--- a/skills/data-designer/workflows/interactive.md
+++ b/skills/data-designer/workflows/interactive.md
@@ -29,5 +29,5 @@ This is an interactive, iterative design process. Do not disengage from the loop
    - Apply changes, re-validate, and re-preview. Repeat until the user is satisfied.
 8. **Finalize** — Once the user is happy, tell them they can run the following command to create the dataset:
   - `data-designer create <path> --num-records <N> --dataset-name <name>`.
-  - Warn that generation time varies — it depends on factors like record count, number of LLM columns, and inference throughput.
+  - Caution the user that generation speed depends heavily on the dataset configuration and their inference setup.
   - Do not run this command yourself — the user should control when it runs.

From d4f55565b90f851f0f125d71ebfdfc0b96b37006 Mon Sep 17 00:00:00 2001
From: Johnny Greco <jogreco@nvidia.com>
Date: Thu, 19 Mar 2026 20:43:39 -0700
Subject: [PATCH 11/12] feat: make preview review a dedicated workflow step

---
 skills/data-designer/workflows/interactive.md | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/skills/data-designer/workflows/interactive.md b/skills/data-designer/workflows/interactive.md
index d4a4ab33..3c4f31ef 100644
--- a/skills/data-designer/workflows/interactive.md
+++ b/skills/data-designer/workflows/interactive.md
@@ -23,11 +23,9 @@ This is an interactive, iterative design process. Do not disengage from the loop
 6. **Preview** — Run `data-designer preview <path> --save-results` to generate sample records as HTML files.
   - Note the sample records directory printed by the `data-designer preview` command
   - Give the user a clickable link: `file://<sample-records-dir>/sample_records_browser.html`
-7. **Iterate**
-   - Ask the user for feedback.
-   - Offer to review the records yourself and suggest improvements. If the user accepts, read `references/preview-review.md` for guidance.
-   - Apply changes, re-validate, and re-preview. Repeat until the user is satisfied.
-8. **Finalize** — Once the user is happy, tell them they can run the following command to create the dataset:
+7. **Review** — Review the preview records following `references/preview-review.md`. Share a brief assessment — what looks good and what could improve. Then ask the user if they want to act on any of it or have other feedback.
+8. **Iterate** — Apply changes, re-validate, and re-preview. Repeat until the user is satisfied.
+9. **Finalize** — Once the user is happy, tell them they can run the following command to create the dataset:
   - `data-designer create <path> --num-records <N> --dataset-name <name>`.
   - Caution the user that generation speed depends heavily on the dataset configuration and their inference setup.
   - Do not run this command yourself — the user should control when it runs.

From 71eaa3ab6492b20a6afa6484bba32cd5eea60173 Mon Sep 17 00:00:00 2001
From: Johnny Greco <jogreco@nvidia.com>
Date: Fri, 20 Mar 2026 07:11:17 -0700
Subject: [PATCH 12/12] fix: revert preview review to an offer within the
 iterate step

---
 skills/data-designer/workflows/interactive.md | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/skills/data-designer/workflows/interactive.md b/skills/data-designer/workflows/interactive.md
index 3c4f31ef..d4a4ab33 100644
--- a/skills/data-designer/workflows/interactive.md
+++ b/skills/data-designer/workflows/interactive.md
@@ -23,9 +23,11 @@ This is an interactive, iterative design process. Do not disengage from the loop
 6. **Preview** — Run `data-designer preview <path> --save-results` to generate sample records as HTML files.
   - Note the sample records directory printed by the `data-designer preview` command
   - Give the user a clickable link: `file://<sample-records-dir>/sample_records_browser.html`
-7. **Review** — Review the preview records following `references/preview-review.md`. Share a brief assessment — what looks good and what could improve. Then ask the user if they want to act on any of it or have other feedback.
-8. **Iterate** — Apply changes, re-validate, and re-preview. Repeat until the user is satisfied.
-9. **Finalize** — Once the user is happy, tell them they can run the following command to create the dataset:
+7. **Iterate**
+   - Ask the user for feedback.
+   - Offer to review the records yourself and suggest improvements. If the user accepts, read `references/preview-review.md` for guidance.
+   - Apply changes, re-validate, and re-preview. Repeat until the user is satisfied.
+8. **Finalize** — Once the user is happy, tell them they can run the following command to create the dataset:
   - `data-designer create <path> --num-records <N> --dataset-name <name>`.
   - Caution the user that generation speed depends heavily on the dataset configuration and their inference setup.
   - Do not run this command yourself — the user should control when it runs.