docs(devnotes): add Nemotron-Personas dev note#611
Conversation
Signed-off-by: Yev Meyer <ymeyer@nvidia.com>
Signed-off-by: Yev Meyer <ymeyer@nvidia.com>
|
MkDocs preview: https://cb8a7ea0.dd-docs-preview.pages.dev Fern preview: https://nvidia-preview-pr-611.docs.buildwithfern.com/nemo/datadesigner
|
danecor
left a comment
There was a problem hiding this comment.
Looks good! Some possible issues / suggestions attached.
Signed-off-by: Yev Meyer <ymeyer@nvidia.com>
…_dev_note # Conflicts: # docs/scripts/generate_colab_notebooks.py
Code Review: PR #611 —
|
Greptile SummaryThis PR ships the Inside Nemotron-Personas dev note alongside Tutorial 7 ("Reproducing & Customizing Nemotron-Personas"), which demonstrates the four-stage compound-AI pipeline that builds the Nemotron-Personas HF collection. It also extends
|
| Filename | Overview |
|---|---|
| docs/scripts/generate_colab_notebooks.py | Adds ADDITIONAL_API_KEY_BLOCKS and NGC_API_KEY_BLOCK to inject NGC_API_KEY env-var handling into Tutorial 7's Colab cell; ADDITIONAL_SETUP_CELLS added as an empty extension point. Logic is correct — imports are already present in COLAB_API_KEY_CELL and the block is joined cleanly. |
| docs/notebook_source/7-nemotron-personas.py | New tutorial notebook reproducing the Nemotron-Personas pipeline. SAMPLE_FROM_SDG_PGM=True path is intentionally a hook (raises NotImplementedError) but Next Steps prose advertises flipping it (previously flagged). Age conditionals are always true given age_range=[18,114] (previously flagged). |
| docs/devnotes/posts/nemotron-personas.md | New dev note covering the 4-stage pipeline, Nemotron training usage, and customization pattern. Well-structured and consistent with the notebook's code examples. |
| fern/versions/latest.yml | Adds the dev note to Dev Notes but does not add Tutorial 7 to the Tutorials section, creating a nav asymmetry versus mkdocs.yml. |
| mkdocs.yml | Adds Tutorial 7 entry under Tutorials nav; clean 2-line addition. |
| fern/versions/latest/pages/devnotes/posts/nemotron-personas.mdx | Fern-flavored MDX mirror of the dev note; content is consistent with the mkdocs variant. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[NGC-hosted Nemotron-Personas Dataset\nor SDG-PGMs custom PGM] -->|PersonSampler| B
subgraph Stage1 ["Stage 1: OCEAN Big-Five Sampling"]
B[Sample OCEAN T-scores\nmu=50, sigma=10, clip 20-80]
B --> B2[Score to label to prose description per trait]
end
subgraph Stage2 ["Stage 2: Demographically-Grounded Sampling"]
C[PGM-grounded demographic record\nage x education x occupation x geography]
end
B2 --> D
C --> D
subgraph Stage3 ["Stage 3: Persona Attributes via LLM Structured Output"]
D[LLMStructuredColumnConfig\nPersonaAttributes schema]
D --> D2["cultural_background / skills_and_expertise\ncareer_goals_and_ambitions / hobbies_and_interests"]
end
D2 --> E
subgraph Stage4 ["Stage 4: Persona Descriptions via LLM Structured Output"]
E[LLMStructuredColumnConfig\nPersonas schema]
E --> E2["professional / finance / healthcare\nsports / arts / travel / culinary\nconcise / detailed persona"]
end
E2 --> F[Released Nemotron-Personas Dataset\n~53M personas across 7 locales]
E2 --> G[Custom Extension\ne.g. TechPersona schema]
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 1
fern/versions/latest.yml:64-79
**Tutorial 7 missing from fern Tutorials nav**
`mkdocs.yml` adds *Reproducing & Customizing Nemotron-Personas* as Tutorial 7 under the Tutorials section, but `fern/versions/latest.yml` only adds the dev note to Dev Notes — the Tutorials section here still ends at Tutorial 6 (Image-to-Image Editing). Users browsing the fern-based docs won't find Tutorial 7 through the Tutorials nav; they can only reach the Colab notebook via the dev note link.
Reviews (4): Last reviewed commit: "docs(devnotes): move Nemotron-Personas t..." | Re-trigger Greptile
Signed-off-by: Yev Meyer <ymeyer@nvidia.com>
…navs Signed-off-by: Yev Meyer <ymeyer@nvidia.com>
|
Hey Yev, leaving a few flags from Codex review here so they are visible before the human review comes through. A human review is still coming.
Narratively, the post reads well: the flow from why personas matter, to how they are used, to how Data Designer builds and customizes them is strong. These are mostly accuracy / maintenance flags rather than a request for a structural rewrite. |
|
Re review from Codez:
I updated the language in the note to make this a bit more clear. Rebased to bring the prompt sensitivity and updated mkdocs/fern. Should be good to go. |
johnnygreco
left a comment
There was a problem hiding this comment.
this is an awesome post @3mei!!! thanks!
Note that I think the blog card is missing. Up to you if you want to add now or in a follow up
📋 Summary
Adds the Inside Nemotron-Personas dev note covering how the multi-locale Nemotron-Personas HF collection is built (4-stage compound-AI pipeline) and how it's used as a seeding primitive across Nemotron training (long-context, tool-use, formal logic, safety refusals, instruction-following). Ships alongside a runnable Tutorial 7 demonstrating reproduction + customization, plus a Colab variant
🔗 Related Issue
N/A
🔄 Changes
✨ Added
docs/devnotes/posts/nemotron-personas.md— new dev notedocs/devnotes/posts/assets/nemotron-personas/— four images: three pipeline-stage diagrams from the partner repo plus a black-backgroundNemotron-Personasworld-map herodocs/notebook_source/7-nemotron-personas.py— jupytext source for the Reproducing & Customizing Nemotron-Personas tutorial;docs/colab_notebooks/7-nemotron-personas.ipynb— committed Colab variant; i🔧 Changed
docs/scripts/generate_colab_notebooks.py— adds anADDITIONAL_SETUP_CELLSmap parallelingADDITIONAL_DEPENDENCIES; injects NGC CLI install +NGC_API_KEYcells. Future devnote-paired tutorials needing extra Colab bootstrap can register one-line entries in the same map.mkdocs.yml— adds Reproducing & Customizing Nemotron-Personas under the Tutorials nav🧪 Testing
make testpassesjupytext --to ipynb --executemake generate-colab-notebooksregenerates the Colab.ipynbcleanly with the NGC setup cells in the expected positionmake convert-execute-notebooksand gated onNVIDIA_API_KEY+ on-disk NGC dataset, matching how Tutorials 5/6 are gated onOPENROUTER_API_KEY)✅ Checklist