Skip to content

feat: add show_sample_rows tag and extend PII protection to model, column, and test level#973

Open
joostboon wants to merge 12 commits intomasterfrom
feat/add-show-sample-rows-tag
Open

feat: add show_sample_rows tag and extend PII protection to model, column, and test level#973
joostboon wants to merge 12 commits intomasterfrom
feat/add-show-sample-rows-tag

Conversation

@joostboon
Copy link
Contributor

@joostboon joostboon commented Mar 22, 2026

Summary

  • Adds a new show_sample_rows tag that is the inverse of the existing PII tag behavior
  • When enable_samples_on_show_sample_rows_tags: true is set (default: false), models or columns tagged with show_sample_rows will have their samples shown even when PII tags would otherwise suppress them
  • disable_test_samples at the test level still takes full precedence over this override

Changes

  • get_config_var.sql: Added enable_samples_on_show_sample_rows_tags (default: false) and show_sample_rows_tags (default: ["show_sample_rows"]) config vars
  • is_show_sample_rows_table.sql (new): Macro that checks if a model's tags match any show_sample_rows_tags; mirrors is_pii_table.sql
  • get_pii_columns_from_parent_model.sql: Skips columns tagged with show_sample_rows when building the PII columns list
  • test.sql: Wraps the is_pii_table / should_disable_sampling_for_pii checks so they are bypassed when the model has a show_sample_rows tag

Test plan

  • Tag a model with both pii and show_sample_rows, set both feature flags to true — confirm samples are returned
  • Tag a column with both pii and show_sample_rows, both flags enabled — confirm column appears in samples
  • Set enable_samples_on_show_sample_rows_tags: false with a show_sample_rows-tagged model — confirm PII hiding still applies
  • Only disable_samples_on_pii_tags: true, no show_sample_rows tag — confirm existing behavior unchanged
  • Both flags enabled, model tagged show_sample_rows, test has meta.disable_test_samples: true — confirm samples still hidden
  • Run existing PII sampling tests: test_sampling_pii.py, test_column_pii_sampling.py, test_disable_samples_config.py

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Configurable "show sample rows" tags and a visibility check to allow showing sample rows even when PII sampling would otherwise be suppressed.
  • Bug Fixes

    • Anomaly alerts now list anomalous dimension values and summarize large groups appropriately.
    • Decimal serialization correctly handles special numeric values (Infinity/NaN).
    • Sampling behavior preserves sample rows when show-sample-tags apply.
  • Tests

    • Added integration tests validating anomaly description behavior for few vs many failures.
  • Chores

    • Added .claude/ to ignore patterns.

@github-actions
Copy link
Contributor

👋 @joostboon
Thank you for raising your pull request.
Please make sure to add tests and document all user-facing changes.
You can do this by editing the docs files in the elementary repository.

@coderabbitai
Copy link

coderabbitai bot commented Mar 22, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a feature-flagged "show sample rows" flow that preserves sample rows for tagged models and skips PII suppression for those columns, updates PII detection to respect show-sample tags, refactors anomaly test result descriptions to summarize anomalous dimension values, adds integration tests for anomaly descriptions, tightens Decimal serialization, and updates .gitignore.

Changes

Cohort / File(s) Summary
Config
macros/edr/system/system_utils/get_config_var.sql
Adds enable_samples_on_show_sample_rows_tags (default false) and show_sample_rows_tags (default ["show_sample_rows"]).
Show-sample-rows feature
macros/edr/system/system_utils/is_show_sample_rows_table.sql, macros/edr/system/system_utils/get_pii_columns_from_parent_model.sql, macros/edr/materializations/test/test.sql
Adds is_show_sample_rows_table(flattened_test) macro and config-driven show_sample_rows_tags; PII detection now can skip columns when show-sample tag matched and column lacks PII tags; materialization logic no longer forcibly zeroes sample_limit when is_show_sample_rows_table is true (gated by new config flag). Review: conditional interactions between macros and materialization.
Anomaly description logic
macros/edr/data_monitoring/anomaly_detection/store_anomaly_test_results.sql
Rebuilds test_results_description from collected anomalous_rows: for >5 anomalous dimension values emits summarized message (first 5 values + remaining count); for ≤5 concatenates per-row descriptions; falls back to last scored row or default message if needed.
Tests & Decimal serialization
integration_tests/tests/test_dimension_anomalies.py, integration_tests/tests/adapter_query_runner.py
Adds two integration tests validating anomaly description for ≤5 vs >5 anomalous values; adjusts _serialize_value() to handle Decimal normalized exponents that are non-numeric (Infinity/NaN) by converting those to float, otherwise preserves int/float mapping.
Misc
.gitignore
Adds .claude/ to ignore patterns.

Sequence Diagram(s)

sequenceDiagram
  rect rgba(200,200,255,0.5)
    participant TestRunner as DbtProject.test
  end
  rect rgba(200,255,200,0.5)
    participant Materialization as test.materialization
    participant ShowMacro as is_show_sample_rows_table
    participant PII as get_pii_columns_from_parent_model
  end
  rect rgba(255,200,200,0.5)
    participant AnomalyMacro as store_anomaly_test_results
    participant Results as TestResults
  end

  TestRunner->>Materialization: invoke test materialization (flattened_test)
  Materialization->>ShowMacro: call is_show_sample_rows_table(flattened_test)
  ShowMacro-->>Materialization: true/false
  alt show_sample_rows true
    Materialization->>PII: skip PII-based sample disabling (preserve sample_limit)
  else show_sample_rows false
    Materialization->>PII: evaluate PII tags -> may set sample_limit=0
  end
  TestRunner->>AnomalyMacro: persist anomaly results
  AnomalyMacro->>Results: build test_results_description (anomalous_rows summary)
  AnomalyMacro-->>TestRunner: emit results
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 I hopped through tags both bright and small,
I kept sample rows for those who call,
I counted odd dimensions, five or more,
Fixed decimals that puzzled before,
A tiny rabbit celebrates this change — hoppity hooray!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Title check ✅ Passed The title accurately describes the main change: adding a show_sample_rows tag to allow overriding PII-based sample hiding at model, column, and test levels.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/add-show-sample-rows-tag

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@macros/edr/system/system_utils/get_pii_columns_from_parent_model.sql`:
- Around line 41-54: This file was auto-modified by the SQL formatter; run the
project's pre-commit hooks (or directly run sqlfmt) locally and commit the
formatted changes so CI passes. Reformat the macro that defines
enable_show_tags, raw_show_tags, show_tags and the loop over
column_node/all_column_tags_lower to match the repo's sqlfmt/pre-commit rules,
then stage and push the resulting changes.
- Around line 41-46: The current normalization of show_sample_rows_tags
mistakenly treats strings as iterable (so "show" becomes ["s","h","o","w"]);
update the block that sets enable_show_tags / raw_show_tags / show_tags to
follow the same pattern used for pii_tags and get_column_tags: first check if
raw_show_tags is string and wrap it into a single-item list, else if it's
iterable use it, then map to lower and cast to list; adjust the show_tags
assignment (the variable name show_tags and the source raw_show_tags)
accordingly so the downstream intersection check uses a proper list of tag
strings.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 180b1499-7df2-47b3-b07a-5ef781014c1f

📥 Commits

Reviewing files that changed from the base of the PR and between 7a2b542 and 5b68070.

📒 Files selected for processing (8)
  • .gitignore
  • integration_tests/tests/adapter_query_runner.py
  • integration_tests/tests/test_dimension_anomalies.py
  • macros/edr/data_monitoring/anomaly_detection/store_anomaly_test_results.sql
  • macros/edr/materializations/test/test.sql
  • macros/edr/system/system_utils/get_config_var.sql
  • macros/edr/system/system_utils/get_pii_columns_from_parent_model.sql
  • macros/edr/system/system_utils/is_show_sample_rows_table.sql

@joostboon
Copy link
Contributor Author

Thanks for the review! Both issues have been addressed:

  • sqlfmt formatting: applied and committed
  • String iterable bug: fixed in both is_show_sample_rows_table.sql and get_pii_columns_from_parent_model.sql — now using the same is string guard pattern as the existing pii_tags normalization to avoid strings being split into individual characters

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@macros/edr/system/system_utils/is_show_sample_rows_table.sql`:
- Around line 14-18: The normalization of raw_model_tags incorrectly treats
strings as iterable (splitting them into characters); update the model_tags
assignment that currently checks "(raw_model_tags if raw_model_tags is iterable
else [raw_model_tags])" to the same pattern used for raw_show_tags/raw_pii_tags
by guarding with "is string" (i.e., treat strings as scalars and wrap them in a
list), so model_tags is created from raw_model_tags with the correct string
check before mapping to lowercase and listing.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 168e5cae-4799-45e3-928a-35b1331c8449

📥 Commits

Reviewing files that changed from the base of the PR and between d1dc20c and 0c7136a.

📒 Files selected for processing (2)
  • macros/edr/materializations/test/test.sql
  • macros/edr/system/system_utils/is_show_sample_rows_table.sql

joostboon and others added 8 commits March 24, 2026 11:08
Adds a new tag mechanism that is the inverse of the existing PII tag
behavior. When enable_samples_on_show_sample_rows_tags is true, models
or columns tagged with show_sample_rows will have their samples shown
even when PII tags would otherwise suppress them.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@joostboon joostboon force-pushed the feat/add-show-sample-rows-tag branch from 0673a0b to bddae5d Compare March 24, 2026 09:09
@joostboon joostboon changed the title feat: add show_sample_rows tag to override PII-based sample hiding feat: add show_sample_rows tag and extend PII protection to model, column, and test level Mar 24, 2026
joostboon and others added 4 commits March 24, 2026 11:43
- Restructure empty elif branch in test.sql with explicit comments
- Column-level show_sample_rows now checks model-level PII tags too
- Add explicit parens for Jinja operator precedence in get_pii_columns
- Replace 'is iterable' with 'is string' guard in is_pii_table.sql
  (strings are iterable in Jinja, causing single tags to be split)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…nner

Vertica's adapter returns Decimal special values where as_tuple().exponent
is a string ('F'/'n') instead of int, causing a TypeError on comparison.
Cherry-picked from worktree-fix-dimension-alerts (0eecf7d).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant