feat: add show_sample_rows tag and extend PII protection to model, column, and test level#973
feat: add show_sample_rows tag and extend PII protection to model, column, and test level#973
Conversation
|
👋 @joostboon |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a feature-flagged "show sample rows" flow that preserves sample rows for tagged models and skips PII suppression for those columns, updates PII detection to respect show-sample tags, refactors anomaly test result descriptions to summarize anomalous dimension values, adds integration tests for anomaly descriptions, tightens Decimal serialization, and updates .gitignore. Changes
Sequence Diagram(s)sequenceDiagram
rect rgba(200,200,255,0.5)
participant TestRunner as DbtProject.test
end
rect rgba(200,255,200,0.5)
participant Materialization as test.materialization
participant ShowMacro as is_show_sample_rows_table
participant PII as get_pii_columns_from_parent_model
end
rect rgba(255,200,200,0.5)
participant AnomalyMacro as store_anomaly_test_results
participant Results as TestResults
end
TestRunner->>Materialization: invoke test materialization (flattened_test)
Materialization->>ShowMacro: call is_show_sample_rows_table(flattened_test)
ShowMacro-->>Materialization: true/false
alt show_sample_rows true
Materialization->>PII: skip PII-based sample disabling (preserve sample_limit)
else show_sample_rows false
Materialization->>PII: evaluate PII tags -> may set sample_limit=0
end
TestRunner->>AnomalyMacro: persist anomaly results
AnomalyMacro->>Results: build test_results_description (anomalous_rows summary)
AnomalyMacro-->>TestRunner: emit results
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@macros/edr/system/system_utils/get_pii_columns_from_parent_model.sql`:
- Around line 41-54: This file was auto-modified by the SQL formatter; run the
project's pre-commit hooks (or directly run sqlfmt) locally and commit the
formatted changes so CI passes. Reformat the macro that defines
enable_show_tags, raw_show_tags, show_tags and the loop over
column_node/all_column_tags_lower to match the repo's sqlfmt/pre-commit rules,
then stage and push the resulting changes.
- Around line 41-46: The current normalization of show_sample_rows_tags
mistakenly treats strings as iterable (so "show" becomes ["s","h","o","w"]);
update the block that sets enable_show_tags / raw_show_tags / show_tags to
follow the same pattern used for pii_tags and get_column_tags: first check if
raw_show_tags is string and wrap it into a single-item list, else if it's
iterable use it, then map to lower and cast to list; adjust the show_tags
assignment (the variable name show_tags and the source raw_show_tags)
accordingly so the downstream intersection check uses a proper list of tag
strings.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 180b1499-7df2-47b3-b07a-5ef781014c1f
📒 Files selected for processing (8)
.gitignoreintegration_tests/tests/adapter_query_runner.pyintegration_tests/tests/test_dimension_anomalies.pymacros/edr/data_monitoring/anomaly_detection/store_anomaly_test_results.sqlmacros/edr/materializations/test/test.sqlmacros/edr/system/system_utils/get_config_var.sqlmacros/edr/system/system_utils/get_pii_columns_from_parent_model.sqlmacros/edr/system/system_utils/is_show_sample_rows_table.sql
macros/edr/system/system_utils/get_pii_columns_from_parent_model.sql
Outdated
Show resolved
Hide resolved
macros/edr/system/system_utils/get_pii_columns_from_parent_model.sql
Outdated
Show resolved
Hide resolved
|
Thanks for the review! Both issues have been addressed:
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@macros/edr/system/system_utils/is_show_sample_rows_table.sql`:
- Around line 14-18: The normalization of raw_model_tags incorrectly treats
strings as iterable (splitting them into characters); update the model_tags
assignment that currently checks "(raw_model_tags if raw_model_tags is iterable
else [raw_model_tags])" to the same pattern used for raw_show_tags/raw_pii_tags
by guarding with "is string" (i.e., treat strings as scalars and wrap them in a
list), so model_tags is created from raw_model_tags with the correct string
check before mapping to lowercase and listing.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 168e5cae-4799-45e3-928a-35b1331c8449
📒 Files selected for processing (2)
macros/edr/materializations/test/test.sqlmacros/edr/system/system_utils/is_show_sample_rows_table.sql
Adds a new tag mechanism that is the inverse of the existing PII tag behavior. When enable_samples_on_show_sample_rows_tags is true, models or columns tagged with show_sample_rows will have their samples shown even when PII tags would otherwise suppress them. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
0673a0b to
bddae5d
Compare
- Restructure empty elif branch in test.sql with explicit comments - Column-level show_sample_rows now checks model-level PII tags too - Add explicit parens for Jinja operator precedence in get_pii_columns - Replace 'is iterable' with 'is string' guard in is_pii_table.sql (strings are iterable in Jinja, causing single tags to be split) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…nner
Vertica's adapter returns Decimal special values where as_tuple().exponent
is a string ('F'/'n') instead of int, causing a TypeError on comparison.
Cherry-picked from worktree-fix-dimension-alerts (0eecf7d).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
show_sample_rowstag that is the inverse of the existing PII tag behaviorenable_samples_on_show_sample_rows_tags: trueis set (default:false), models or columns tagged withshow_sample_rowswill have their samples shown even when PII tags would otherwise suppress themdisable_test_samplesat the test level still takes full precedence over this overrideChanges
get_config_var.sql: Addedenable_samples_on_show_sample_rows_tags(default:false) andshow_sample_rows_tags(default:["show_sample_rows"]) config varsis_show_sample_rows_table.sql(new): Macro that checks if a model's tags match anyshow_sample_rows_tags; mirrorsis_pii_table.sqlget_pii_columns_from_parent_model.sql: Skips columns tagged withshow_sample_rowswhen building the PII columns listtest.sql: Wraps theis_pii_table/should_disable_sampling_for_piichecks so they are bypassed when the model has ashow_sample_rowstagTest plan
piiandshow_sample_rows, set both feature flags totrue— confirm samples are returnedpiiandshow_sample_rows, both flags enabled — confirm column appears in samplesenable_samples_on_show_sample_rows_tags: falsewith ashow_sample_rows-tagged model — confirm PII hiding still appliesdisable_samples_on_pii_tags: true, noshow_sample_rowstag — confirm existing behavior unchangedshow_sample_rows, test hasmeta.disable_test_samples: true— confirm samples still hiddentest_sampling_pii.py,test_column_pii_sampling.py,test_disable_samples_config.py🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Bug Fixes
Tests
Chores