diff --git a/skills/data-designer/references/preview-review.md b/skills/data-designer/references/preview-review.md new file mode 100644 index 00000000..479d687b --- /dev/null +++ b/skills/data-designer/references/preview-review.md @@ -0,0 +1,30 @@ +# Preview Review Guide + +## Mindset + +Quality is statistical, not per-record. Fix systemic issues that affect many records; don't chase cosmetic flaws in individual ones. But don't stop early — clear patterns of broken data or ignored instructions are worth fixing. + +## Reading Sample Records + +Load `dataset.parquet` from the preview results directory (printed as `Results path:` by the preview command, or the most recent `artifacts/preview_results_*/` directory). Use pandas to load the parquet file and print the records in a compact, reviewable format. + +## What to Look For + +The specifics depend on the dataset and its intended use. The categories below are common starting points — adapt based on what matters for this dataset. + +### Diversity +- **Mode collapse**: are records clustering around the same patterns, topics, or phrasings? +- **Sampler effectiveness**: are samplers being used effectively to steer diversity in the dataset? +- **Structural monotony**: do LLM-generated columns follow the same template across records? + +### Data Quality +- **Instruction compliance**: does generated content follow prompt constraints (step counts, format requirements, allowed values)? +- **Internal consistency**: does data within a record agree with itself? +- **Encoding integrity**: no garbled encoding, mojibake, or broken unicode. +- **Plausibility**: do examples look like they could come from the real domain, or are they obviously synthetic? +- **Judge calibration** (if applicable): are scores consistent across similar-quality records? Does the judge catch visible problems? + +### Design Choices +Are the right Data Designer features being used? For example: +- A text column that consistently produces structured data or code might be better as a specialized column type. +- Values drawn from a fixed set or known distribution could use a sampler instead of an LLM column. diff --git a/skills/data-designer/workflows/autopilot.md b/skills/data-designer/workflows/autopilot.md index 4fd08489..2f13b7e7 100644 --- a/skills/data-designer/workflows/autopilot.md +++ b/skills/data-designer/workflows/autopilot.md @@ -20,7 +20,7 @@ In this mode, make reasonable design decisions autonomously based on the dataset - Note the sample records directory printed by the `data-designer preview` command - Give the user a clickable link: `file:///sample_records_browser.html` 7. **Create** — If the user specified a record count: - - 50 or fewer: run `data-designer create --num-records --dataset-name ` directly. - - More than 50: warn that generation can take a long time and ask for confirmation before running. + - Run `data-designer create --num-records --dataset-name `. + - Generation speed depends heavily on the dataset configuration and the user's inference setup. For larger datasets, warn the user and ask for confirmation before running. - If no record count was specified, skip this step. 8. **Present** — Summarize what was built: columns, samplers used, key design choices. If the create command was run, share the results. Ask the user if they want any changes. If so, edit the script, re-validate, re-preview, and iterate. diff --git a/skills/data-designer/workflows/interactive.md b/skills/data-designer/workflows/interactive.md index 81d22c94..d4a4ab33 100644 --- a/skills/data-designer/workflows/interactive.md +++ b/skills/data-designer/workflows/interactive.md @@ -23,8 +23,11 @@ This is an interactive, iterative design process. Do not disengage from the loop 6. **Preview** — Run `data-designer preview --save-results` to generate sample records as HTML files. - Note the sample records directory printed by the `data-designer preview` command - Give the user a clickable link: `file:///sample_records_browser.html` -7. **Iterate** — Ask the user for feedback. Edit the script, re-validate, re-preview, and serve again. Repeat until they are satisfied. +7. **Iterate** + - Ask the user for feedback. + - Offer to review the records yourself and suggest improvements. If the user accepts, read `references/preview-review.md` for guidance. + - Apply changes, re-validate, and re-preview. Repeat until the user is satisfied. 8. **Finalize** — Once the user is happy, tell them they can run the following command to create the dataset: - `data-designer create --num-records --dataset-name `. - - Warn the user that generation can take a long time for large record counts (50+). - - Do not run this command yourself — it can take a long time for large datasets and the user should control when it runs. + - Caution the user that generation speed depends heavily on the dataset configuration and their inference setup. + - Do not run this command yourself — the user should control when it runs.