Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
59 commits
Select commit Hold shift + click to select a range
aa3dfc0
remove old structure
johnnygreco Jan 22, 2026
225b468
major shuffle
johnnygreco Jan 22, 2026
9a7a4da
streamline project configs
johnnygreco Jan 22, 2026
19e929d
update make commands
johnnygreco Jan 22, 2026
d7843b8
updates to make commands
johnnygreco Jan 22, 2026
52a66f6
remove essentials
johnnygreco Jan 22, 2026
e0efa86
initialize logger in interface
johnnygreco Jan 22, 2026
e6e455c
uv lock
johnnygreco Jan 22, 2026
40468c4
ignore notepad
johnnygreco Jan 22, 2026
aa6536d
update workflows
johnnygreco Jan 22, 2026
147f6ff
fix e2e project config
johnnygreco Jan 22, 2026
51d1968
generate colab notebooks
johnnygreco Jan 22, 2026
ae67f12
resolve default model settings in interface
johnnygreco Jan 22, 2026
75a9582
fix build commands
johnnygreco Jan 22, 2026
a1f33a6
update perf import make command
johnnygreco Jan 23, 2026
806425d
cleaning up some slop
johnnygreco Jan 23, 2026
05dad94
update recipes
johnnygreco Jan 23, 2026
3c110ae
move conftest files to tests/
johnnygreco Jan 23, 2026
02648b3
update subpackage readmes
johnnygreco Jan 23, 2026
c755b59
streamline config_logging
johnnygreco Jan 23, 2026
59e1618
use exports
johnnygreco Jan 23, 2026
15ea546
update perf import usage pattern
johnnygreco Jan 23, 2026
74a6cca
update for IDE behavior with ruff
johnnygreco Jan 23, 2026
b2a3807
remove engine's fixtures file
johnnygreco Jan 23, 2026
84c459c
add note to about lazy imports
johnnygreco Jan 23, 2026
2456e97
update dependencies
johnnygreco Jan 23, 2026
c0aa9da
update docs
johnnygreco Jan 23, 2026
368725c
doc fixes
johnnygreco Jan 23, 2026
df0ebc4
uv lock
johnnygreco Jan 23, 2026
40b8594
updates to catch up with main
johnnygreco Jan 23, 2026
c6c17ef
clean up makefile
johnnygreco Jan 23, 2026
d4256f5
remove package gitignores
johnnygreco Jan 24, 2026
3b3a988
define deps only once
johnnygreco Jan 24, 2026
86a5886
isolate tests
johnnygreco Jan 24, 2026
d2166f6
add test for protetion rule
johnnygreco Jan 24, 2026
6c6eab4
create temp dirs for isolated tests
johnnygreco Jan 26, 2026
09f9910
catch up to main
johnnygreco Jan 26, 2026
5e54338
update headers
johnnygreco Jan 26, 2026
b6e375d
re apply changes
johnnygreco Jan 26, 2026
bf7e6d6
better result summaries for isolated tests
johnnygreco Jan 26, 2026
1948238
move exports into top-level init
johnnygreco Jan 27, 2026
824d8ed
fix client importlib version syntax
johnnygreco Jan 27, 2026
6d69d23
add publish script and make command
johnnygreco Jan 23, 2026
e93f060
update headers script
johnnygreco Jan 23, 2026
8b661f1
fix username check
johnnygreco Jan 23, 2026
598a955
update check again
johnnygreco Jan 23, 2026
9da4bcb
force tag option; use abort instead
johnnygreco Jan 26, 2026
ae14daa
add testpypi, twine check, and help
johnnygreco Jan 27, 2026
fa5cdfc
add new publish options
johnnygreco Jan 27, 2026
4d00dda
handle local tag in testpypi mode
johnnygreco Jan 27, 2026
7e7c34e
fix version extraction regex and optimize testpypi flow
johnnygreco Jan 27, 2026
10c8b5e
use top-level readme
johnnygreco Jan 27, 2026
e722bee
make readme symlink
johnnygreco Jan 27, 2026
1e18e1e
remove readme for build process
johnnygreco Jan 27, 2026
f665ed8
feat: wire up pt_GB and en_SG personas (#245)
johnnygreco Jan 27, 2026
e92ef72
feat: Add /create-pr skill for well-formatted GitHub PRs (#247)
johnnygreco Jan 27, 2026
74791d0
docs: Fix mkdocs syntax and update person sampling documentation (#249)
johnnygreco Jan 27, 2026
3a04acb
refactor: slim package refactor into three subpackages (#240)
johnnygreco Jan 27, 2026
78bf851
remove readme
johnnygreco Jan 27, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
132 changes: 132 additions & 0 deletions .claude/skills/create-pr/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
---
name: create-pr
description: Create a GitHub PR with a well-formatted description including summary, categorized changes, and attention areas
argument-hint: [special instructions]
disable-model-invocation: true
---

# Create Pull Request

Create a well-formatted GitHub pull request for the current branch.

## Arguments

`$ARGUMENTS` can be used for special instructions, such as:
- Specifying a base branch: "use base branch: develop"
- Guiding the summary: "emphasize the performance improvements in the summary"
- Adding context: "this is part of the auth refactor epic"
- Any other guidance for PR creation

Default base branch: `main` (unless specified in arguments)

## Step 1: Gather Information

Run these commands in parallel to understand the changes:

1. **Current branch**: `git branch --show-current`
2. **Uncommitted changes**: `git status --porcelain`
3. **Commits on branch**: `git log origin/main..HEAD --oneline`
4. **File changes summary**: `git diff --stat origin/main..HEAD`
5. **Full diff**: `git diff origin/main..HEAD`
6. **Recent commit style**: `git log -5 --oneline` (to match PR title convention)

**Important checks:**
- If uncommitted changes exist, warn the user and ask if they want to commit first
- If no commits ahead of main, inform the user there's nothing to PR
- If branch isn't pushed, you'll push it in Step 4

## Step 2: Analyze and Categorize Changes

### By Change Type (from commits and diff)
- ✨ **Added**: New files, features, capabilities
- 🔧 **Changed**: Modified existing functionality
- 🗑️ **Removed**: Deleted files or features
- 🐛 **Fixed**: Bug fixes
- 📚 **Docs**: Documentation updates
- 🧪 **Tests**: Test additions/modifications

### Identify Attention Areas 🔍
Flag for special reviewer attention:
- Files with significant changes (>100 lines)
- Changes to base classes, interfaces, or public API
- New dependencies (`pyproject.toml`, `requirements.txt`)
- Configuration schema changes
- Security-related changes

## Step 3: Generate PR Title

Use conventional commit format matching the repo style:
- `feat:` for new features
- `fix:` for bug fixes
- `docs:` for documentation
- `refactor:` for refactoring
- `chore:` for maintenance
- `test:` for test changes

If commits have mixed types, use the primary/most significant type.

## Step 4: Create the PR

1. **Push branch** (if needed):
```bash
git push -u origin <branch-name>
```

2. **Create PR** using this template:

```markdown
## 📋 Summary

[1-2 sentence overview of what this PR accomplishes]

## 🔄 Changes

### ✨ Added
- [New features/files - link to key files when helpful]

### 🔧 Changed
- [Modified functionality - reference commits for specific changes]

### 🗑️ Removed
- [Deleted items]

### 🐛 Fixed
- [Bug fixes - if applicable]

## 🔍 Attention Areas

> ⚠️ **Reviewers:** Please pay special attention to the following:

- `path/to/critical/file.py` - [Why this needs attention]

---
🤖 *Generated with AI*
```

3. **Execute**:
```bash
gh pr create --title "<title>" --body "$(cat <<'EOF'
<body>
EOF
)"
```

4. **Return the PR URL** to the user.

## Section Guidelines

- **Summary**: Always include - be concise and focus on the "why"
- **Changes**: Group by type, omit empty sections
- **Attention Areas**: Only include if there are genuinely important items; omit for simple PRs
- **Links**: Include links to code and commits where helpful for reviewers:
- Link to specific files: `[filename](path/to/file.py)` - GitHub auto-links files in the repo
- Link to specific lines: `[description](path/to/file.py#L42-L50)` for key code sections
- Reference commits: `abc1234` - GitHub auto-links commit SHAs
- For multi-commit PRs, reference individual commits when describing specific changes

## Edge Cases

- **No changes**: Inform user there's nothing to create a PR for
- **Uncommitted work**: Warn and ask before proceeding
- **Large PRs** (>20 files): Summarize by directory/module
- **Single commit**: PR title can match commit message
3 changes: 2 additions & 1 deletion .claude/skills/new-sdg/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
---
name: new-sdg
name: new-sdg
description: Implement a new synthetic data generator using NeMo Data Designer by defining its configuration and executing a preview job.
argument-hint: <dataset-description>
disable-model-invocation: true
---

# Your Goal
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/build-docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ jobs:
- name: Set up Python
run: uv python install 3.11
- name: Install dependencies for docs
run: uv sync --group docs
run: uv sync --all-packages --group docs
- name: Download artifact from previous step
uses: actions/download-artifact@v5
with:
Expand Down
3 changes: 1 addition & 2 deletions .github/workflows/check-colab-notebooks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,8 +28,7 @@ jobs:
enable-cache: true

- name: Install dependencies
run: |
uv sync --group notebooks --group docs
run: uv sync --all-packages --group notebooks --group docs

- name: Generate Colab notebooks
run: |
Expand Down
172 changes: 156 additions & 16 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,13 @@ on:
workflow_dispatch:

jobs:
test:
name: Test (Python ${{ matrix.python-version }} on ${{ matrix.os }})
# ===========================================================================
# Independent Package Tests
# Each package is tested in isolation to ensure proper dependency boundaries
# ===========================================================================

test-config:
name: Test Config (Python ${{ matrix.python-version }} on ${{ matrix.os }})
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
Expand All @@ -28,13 +33,120 @@ jobs:
python-version: ${{ matrix.python-version }}
enable-cache: true

- name: Install dependencies
- name: Install data-designer-config only
run: uv sync --package data-designer-config

- name: Run config tests
run: |
uv run --with pytest --with pytest-cov --with pytest-asyncio --with pytest-env \
pytest -v \
packages/data-designer-config/tests \
--cov=data_designer \
--cov-report=term-missing

test-engine:
name: Test Engine (Python ${{ matrix.python-version }} on ${{ matrix.os }})
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
matrix:
os: [ubuntu-latest, macos-latest]
python-version: ["3.10", "3.11", "3.12", "3.13"]

steps:
- name: Checkout code
uses: actions/checkout@v4

- name: Install uv
uses: astral-sh/setup-uv@v5
with:
version: "latest"
python-version: ${{ matrix.python-version }}
enable-cache: true

- name: Install data-designer-engine only
run: uv sync --package data-designer-engine

- name: Run engine tests
run: |
uv run --with pytest --with pytest-cov --with pytest-asyncio --with pytest-httpx --with pytest-env \
pytest -v \
packages/data-designer-engine/tests \
--cov=data_designer \
--cov-report=term-missing

test-interface:
name: Test Interface (Python ${{ matrix.python-version }} on ${{ matrix.os }})
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
matrix:
os: [ubuntu-latest, macos-latest]
python-version: ["3.10", "3.11", "3.12", "3.13"]

steps:
- name: Checkout code
uses: actions/checkout@v4

- name: Install uv
uses: astral-sh/setup-uv@v5
with:
version: "latest"
python-version: ${{ matrix.python-version }}
enable-cache: true

- name: Install data-designer (full package)
run: uv sync --package data-designer

- name: Run interface tests
run: |
uv sync --group dev
uv run --with pytest --with pytest-cov --with pytest-asyncio --with pytest-httpx --with pytest-env \
pytest -v \
packages/data-designer/tests \
--cov=data_designer \
--cov-report=term-missing

# ===========================================================================
# Combined Coverage Check
# Runs all tests together to verify overall coverage threshold
# ===========================================================================

coverage:
name: Coverage Check (Python ${{ matrix.python-version }})
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
python-version: ["3.11"]

steps:
- name: Checkout code
uses: actions/checkout@v4

- name: Install uv
uses: astral-sh/setup-uv@v5
with:
version: "latest"
python-version: ${{ matrix.python-version }}
enable-cache: true

- name: Install all packages
run: make install-dev

- name: Run tests with coverage
run: |
uv run pytest -v --cov=data_designer --cov-report=term-missing --cov-report=xml --cov-fail-under=90
uv run --group dev pytest -v \
packages/data-designer-config/tests \
packages/data-designer-engine/tests \
packages/data-designer/tests \
--cov=data_designer \
--cov-report=term-missing \
--cov-report=xml \
--cov-fail-under=90

# ===========================================================================
# End-to-End Tests
# ===========================================================================

test-e2e:
name: End to end test (Python ${{ matrix.python-version }} on ${{ matrix.os }})
Expand All @@ -57,8 +169,11 @@ jobs:
enable-cache: false

- name: Run e2e tests
run: |
make test-e2e
run: make test-e2e

# ===========================================================================
# Code Quality
# ===========================================================================

lint:
name: Lint and Format Check
Expand All @@ -76,16 +191,13 @@ jobs:
enable-cache: true

- name: Install dependencies
run: |
uv sync --group dev
run: make install-dev

- name: Check formatting
run: |
uv run ruff format --check
run: make format-check

- name: Run linter
run: |
uv run ruff check
run: make lint

license-headers:
name: Check License Headers
Expand All @@ -105,9 +217,37 @@ jobs:
enable-cache: true

- name: Install dependencies
run: |
uv sync --group dev
run: make install-dev

- name: Check license headers
run: make check-license-headers

# ===========================================================================
# Summary Job for Branch Protection
# This job creates status checks matching the old job naming convention
# so that branch protection rules continue to work.
# ===========================================================================

test-summary:
name: Test (Python ${{ matrix.python-version }} on ${{ matrix.os }})
runs-on: ubuntu-latest
needs: [test-config, test-engine, test-interface]
if: always()
strategy:
matrix:
os: [ubuntu-latest, macos-latest]
python-version: ["3.10", "3.11", "3.12", "3.13"]

steps:
- name: Check all tests passed
run: |
uv run python scripts/update_license_headers.py --check
if [[ "${{ needs.test-config.result }}" != "success" ]] || \
[[ "${{ needs.test-engine.result }}" != "success" ]] || \
[[ "${{ needs.test-interface.result }}" != "success" ]]; then
echo "One or more test jobs failed"
echo "test-config: ${{ needs.test-config.result }}"
echo "test-engine: ${{ needs.test-engine.result }}"
echo "test-interface: ${{ needs.test-interface.result }}"
exit 1
fi
echo "All test jobs passed successfully"
10 changes: 8 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -79,8 +79,13 @@ venv.bak/
*.tar.gz
*.zip

# Auto-generated version file
src/data_designer/_version.py
# Auto-generated version files
**/_version.py

# Auto-generated README for data-designer package
# The top-level README is copied here during build so that PyPI displays the full
# project documentation when users install the main "data-designer" package.
packages/data-designer/README.md

# Local scratch space
.scratch/
Expand All @@ -94,3 +99,4 @@ tests_e2e/uv.lock

# Performance profiling
perf_*.txt
NOTEPAD.md
Loading