Conversation
Greptile SummaryThis PR adds a new dev note ( Several issues were identified and addressed in prior review threads (score-0 falsy check,
|
| Filename | Overview |
|---|---|
| docs/devnotes/.authors.yml | Adds two new author entries (ymeyer, mvansegbroeck) with correct avatar URLs and descriptions; no issues. |
| docs/devnotes/posts/images/bird-benchmark-results.jpg | New binary image file for BIRD benchmark results visualization; no issues. |
| docs/devnotes/posts/images/text-to-sql-pipeline.jpg | New binary image file for the text-to-SQL pipeline diagram; no issues. |
| docs/devnotes/posts/text-to-sql.md | 598-line dev note documenting the enterprise text-to-SQL SDG pipeline. Several issues remain from prior review threads (num_records/300k discrepancy, Window Functions categorisation, EHR Systems naming). Two new minor issues identified: REPLACE() listed as dialect-specific when it is universal, and the "≥ 3/4" judge threshold notation is ambiguous. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["Stage 1: Seeding & Diversification\n(CategorySampler + SubcategorySampler)\nindustry×topic, complexity×sql_concept,\ndialect, task_type, prompt style"] --> B
B["Stage 2: Prompt Generation\n(Reasoning LLM)\nNatural-language business request\n(no SQL jargon)"] --> C
C["Stage 3: Schema + Data Generation\n(Reasoning LLM)\nDialect DDL + INSERT\n+ distractor tables/columns\n+ dirty data injection"] --> D
D["Stage 4: SQL Generation\n(Reasoning LLM)\nDialect-specific executable SQL\nwith chain-of-thought reasoning\n+ dirty data handling"] --> E
E["Stage 5: Quality Waterfall\nSyntax validator (SQLFluff)\n+ 5 LLM judges × 15 dimensions\n0–4 scale per dimension"] --> F
F{"Pass threshold?\n≥ 3/4 on all\ndimensions"}
F -- Yes --> G["Final Dataset\n~32k records / dialect\n96.5k total\n(3 dialects)"]
F -- No --> H["Rejected (~68%)"]
style G fill:#2e7d32,color:#fff
style H fill:#c62828,color:#fff
Prompt To Fix All With AI
This is a comment left during a code review.
Path: docs/devnotes/posts/text-to-sql.md
Line: 483
Comment:
**`REPLACE()` is not dialect-specific**
Key Takeaway #5 lists `REPLACE()` vs `regexp_replace` as an example of dialect-specific syntax differences:
> "the pipeline produces dialect-specific schemas and queries with appropriate syntax (`strftime` vs `DATE_SUB` vs `interval`, `REPLACE()` vs `regexp_replace`)"
However, `REPLACE()` is a universal SQL function supported identically in SQLite, MySQL, and PostgreSQL — it is not dialect-specific. The comparison implies that some dialects use `REPLACE()` while others use `regexp_replace`, but both can be present in the same dialect (e.g., PostgreSQL supports both). A more accurate example pair would be something like `strftime('%Y', col)` (SQLite) vs. `YEAR(col)` (MySQL) vs. `DATE_PART('year', col)` (PostgreSQL), which the `strftime vs DATE_SUB vs interval` pair already illustrates.
```suggestion
5. **Per-dialect generation avoids lowest-common-denominator SQL.** Rather than generating ANSI SQL and hoping it works everywhere, the pipeline produces dialect-specific schemas and queries with appropriate syntax (`strftime` vs `DATE_SUB` vs `interval`, `strftime('%Y', col)` vs `YEAR()` vs `DATE_PART()`). Each dialect gets its own tailored prompts, validators, and judge prompts.
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: docs/devnotes/posts/text-to-sql.md
Line: 449
Comment:
**Ambiguous "≥ 3/4" notation**
The results table entry `≥ 3/4 across all dimensions` uses a fraction-like notation that is ambiguous. A reader could interpret it as "at least 75% (the fraction three-quarters)" rather than the intended meaning of "at least 3 out of a maximum of 4". Given that the judge scoring scale (0–4) is defined just above this table, writing it as `≥ 3 out of 4` makes the threshold unambiguous.
```suggestion
| Minimum judge score | ≥ 3 out of 4 across all dimensions |
```
How can I resolve this? If you propose a fix, please make it concise.Last reviewed commit: 541787f
Signed-off-by: Yev Meyer <ymeyer@nvidia.com>
Signed-off-by: Yev Meyer <ymeyer@nvidia.com>
PR feedback fixes: - Fix Window Functions contradiction: Key Takeaway #1 now uses "Geospatial SQL" (Advanced) instead of "Window Functions" (Intermediate) - Fix score-0 truthiness bug: use `is not none` instead of truthy check in Jinja2 expression columns (inline example + production pipeline) - Soften Code Sandbox language: "A natural next step would be..." instead of "We are actively implementing..." - Cut Gretel reference per mvansegbroeck: replaced with NVIDIA/Nemotron team description - Replace Qwen model references with Nemotron per mvansegbroeck: MODEL_NAME, ASCII diagram labels, Pipeline Overview prose - Rename sdg_qwen_235b.py -> sdg_ndd_text2sql.py per mvansegbroeck - Fix Try It Yourself: use MODEL_ALIAS = "nvidia-text" with default provider pattern (matches structured-outputs dev note), remove unused explicit ModelConfig - Remove placeholder dataset link (#), add "Dataset: Internal" note New content: - Add BIRD Benchmark Results section with bar chart (JPG), data table, BIRD caveat paragraph, and Jocelyn Huang acknowledgement (Nemotron Super EX: 26.77% -> 41.80%, +15 pts, beats GPT-OSS-120B) - Replace "Looking Ahead: Code Sandbox" with broader "Next Steps": Code Sandbox, RL on BIRD via NeMo Gym, schema representation, Spider 2.0 - Add Project Summary table at end of post
- Fix "EHR Systems" -> "Electronic Health Records" in Key Takeaway #1 to match the exact taxonomy string in the code example (greptile) - Add admonition clarifying code snippets are illustrative, not runnable, with link to Enterprise Text-to-SQL Recipe (nabinchha) - Add context before score extraction snippet referencing the five LLMJudgeColumnConfig columns and linking to full recipe (nabinchha) - Add companion file note and recipe link to production pipeline details block for prompts.py, rubrics.py, text2sql_seed.json (nabinchha)
… recipe - Fix "EHR Systems" -> "Electronic Health Records" in Key Takeaway #1 to match the exact taxonomy string in the code example (greptile) - Add admonition clarifying inline code snippets are illustrative, with link to runnable Enterprise Text-to-SQL Recipe (nabinchha) - Add context before score extraction snippet referencing the five LLMJudgeColumnConfig columns and linking to full recipe (nabinchha) - Replace production pipeline <details> block (230 lines with phantom imports from prompts.py, rubrics.py, text2sql_seed.json) with snippet include of enterprise_text_to_sql.py recipe — self-contained and runnable, consistent with other merged dev notes (nabinchha)
- Wrap minimal inline example in collapsible <details> dropdown - Rename "A Team Effort" section to "Summary" - Remove redundant Scale/Dialects/Dataset line
The Step 3/4 prompt templates reference {{ sql_dialect }} but the
Step 1 seeding code never defined it, leaving an unresolved Jinja2
variable for readers following along. Add the sql_dialect sampler
with a comment explaining the pipeline runs once per dialect.
Made-with: Cursor
- Remove specific "60-70%" BIRD claim from intro to avoid contradiction with the 41.80%/38.25% direct-generation results shown later (those higher figures come from specialized systems with schema linking) - Reword MySQL "forbids" to "prompts exclude" -- REGEXP_REPLACE and CONVERT_TZ are valid MySQL functions; the pipeline excluded them for portability, not because the dialect forbids them
| import data_designer.config as dd | ||
| from data_designer.interface import DataDesigner | ||
|
|
||
| MODEL_ALIAS = "nvidia-text" |
There was a problem hiding this comment.
REPLACE() is not dialect-specific
Key Takeaway #5 lists REPLACE() vs regexp_replace as an example of dialect-specific syntax differences:
"the pipeline produces dialect-specific schemas and queries with appropriate syntax (
strftimevsDATE_SUBvsinterval,REPLACE()vsregexp_replace)"
However, REPLACE() is a universal SQL function supported identically in SQLite, MySQL, and PostgreSQL — it is not dialect-specific. The comparison implies that some dialects use REPLACE() while others use regexp_replace, but both can be present in the same dialect (e.g., PostgreSQL supports both). A more accurate example pair would be something like strftime('%Y', col) (SQLite) vs. YEAR(col) (MySQL) vs. DATE_PART('year', col) (PostgreSQL), which the strftime vs DATE_SUB vs interval pair already illustrates.
| MODEL_ALIAS = "nvidia-text" | |
| 5. **Per-dialect generation avoids lowest-common-denominator SQL.** Rather than generating ANSI SQL and hoping it works everywhere, the pipeline produces dialect-specific schemas and queries with appropriate syntax (`strftime` vs `DATE_SUB` vs `interval`, `strftime('%Y', col)` vs `YEAR()` vs `DATE_PART()`). Each dialect gets its own tailored prompts, validators, and judge prompts. |
Prompt To Fix With AI
This is a comment left during a code review.
Path: docs/devnotes/posts/text-to-sql.md
Line: 483
Comment:
**`REPLACE()` is not dialect-specific**
Key Takeaway #5 lists `REPLACE()` vs `regexp_replace` as an example of dialect-specific syntax differences:
> "the pipeline produces dialect-specific schemas and queries with appropriate syntax (`strftime` vs `DATE_SUB` vs `interval`, `REPLACE()` vs `regexp_replace`)"
However, `REPLACE()` is a universal SQL function supported identically in SQLite, MySQL, and PostgreSQL — it is not dialect-specific. The comparison implies that some dialects use `REPLACE()` while others use `regexp_replace`, but both can be present in the same dialect (e.g., PostgreSQL supports both). A more accurate example pair would be something like `strftime('%Y', col)` (SQLite) vs. `YEAR(col)` (MySQL) vs. `DATE_PART('year', col)` (PostgreSQL), which the `strftime vs DATE_SUB vs interval` pair already illustrates.
```suggestion
5. **Per-dialect generation avoids lowest-common-denominator SQL.** Rather than generating ANSI SQL and hoping it works everywhere, the pipeline produces dialect-specific schemas and queries with appropriate syntax (`strftime` vs `DATE_SUB` vs `interval`, `strftime('%Y', col)` vs `YEAR()` vs `DATE_PART()`). Each dialect gets its own tailored prompts, validators, and judge prompts.
```
How can I resolve this? If you propose a fix, please make it concise.
Summary
Add a dev note documenting the enterprise-grade text-to-SQL SDG pipeline used to generate training data for Nemotron's SQL capabilities across PostgreSQL, MySQL, and SQLite.
What's in the post
Files changed