APP-1123: add skip_test_result_rows to skip test_result_rows query in get_test_results macro#2230
APP-1123: add skip_test_result_rows to skip test_result_rows query in get_test_results macro#2230MikaKerman wants to merge 6 commits into
Conversation
Thread a new disable_samples parameter from ReportAPI.get_report_data through TestsAPI -> TestsFetcher -> the get_test_results dbt macro. When disable_samples=true, the macro skips the get_result_rows_agate query against test_result_rows entirely instead of relying on the caller to clear sample_data after the SQL has already run. This is consumed by elementary-internal's cloud report generation, where results_sample has zero consumers (522 GB / 249.6M rows scanned per cycle for nothing). The default remains false so OSS users and other callers are unaffected. Co-Authored-By: mika@elementary-data.com <mika.kerman@gmail.com>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
|
👋 @MikaKerman |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThreads a new skip_test_result_rows flag through DBT macros, TestsFetcher, TestsAPI, and ReportAPI to allow callers to skip retrieval or persistence of per-test result rows. Changesskip_test_result_rows parameter implementation
Sequence DiagramsequenceDiagram
participant ReportAPI
participant TestsAPI
participant TestsFetcher
participant DBT_macro as get_test_results
ReportAPI->>TestsAPI: initialize(skip_test_result_rows)
TestsAPI->>TestsFetcher: get_all_test_results_db_rows(skip_test_result_rows)
TestsFetcher->>DBT_macro: run macro with macro_args(skip_test_result_rows)
DBT_macro->>DBT_macro: conditional fetch/blank test_result_rows (skip when true)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
FYI: Evidence this is a known flake, not a real failure:
I don't have permission to rerun the failed job from this PR (fork). Happy to push an empty commit to retrigger if you'd like a clean run. |
|
One thing I noticed: when For a clean contract, might be worth passing {% if not disable_samples and test.invocations_rank_index == 1 %}
...
{% set test_rows_sample = elementary_cli.get_test_rows_sample(...) %}
{% endif %}This way |
…ults When disable_samples=true, ensure no sample data leaks via the legacy elementary_test_results.result_rows JSON column. Previously only the test_result_rows table query was skipped; get_test_rows_sample would still return data from the legacy column. Thread disable_samples into _process_raw_test_results and gate the entire get_test_rows_sample call so disable_samples=true means no sample data at all. Co-Authored-By: mika@elementary-data.com <mika.kerman@gmail.com>
|
@MikaKerman good catch, confirmed. Fixed in 0e96bc3b: threaded |
|
Another thing: even with For a fully clean implementation, |
Thread disable_samples into current_tests_run_results_query and the clickhouse inline query so the legacy elementary_test_results.result_rows column is replaced with null/empty when samples are disabled. Avoids materializing the column into the temp table and serializing it through agate. Co-Authored-By: mika@elementary-data.com <mika.kerman@gmail.com>
|
@MikaKerman good point — also gated in 3507c3af:
Now with |
|
CI on This fails at |
…ization Separates the macro-level DB optimization from the existing OSS Python-level disable_samples mask: - skip_test_result_rows (new, macro-level): pure DB optimization. Skips the test_result_rows table query, gates get_test_rows_sample, selects null/'' as result_rows. Affects ALL test types. Used by cloud only. - disable_samples (unchanged, Python-level): masks sample_data for dbt_test rows only. Used by OSS --disable-samples CLI flag for PII protection. Anomaly metrics are preserved. Co-Authored-By: mika@elementary-data.com <mika.kerman@gmail.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@elementary/monitor/dbt_project/macros/get_test_results.sql`:
- Line 185: The ClickHouse macro clickhouse__get_test_results calls
_process_raw_test_results with elementary_tests_allowlist_status but never
initializes it in the ClickHouse path; initialize
elementary_tests_allowlist_status (e.g., to an empty array/object consistent
with other DB paths) before the post-processing call so
_process_raw_test_results always receives a defined value and sample selection
for non-dbt_test rows is not skipped; update the clickhouse__get_test_results
macro to set elementary_tests_allowlist_status (same name) prior to invoking
_process_raw_test_results.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 8b190d45-2615-4b71-a58f-f287aad64be7
📒 Files selected for processing (5)
elementary/monitor/api/report/report.pyelementary/monitor/api/tests/tests.pyelementary/monitor/dbt_project/macros/base_queries/current_tests_run_results_query.sqlelementary/monitor/dbt_project/macros/get_test_results.sqlelementary/monitor/fetchers/tests/tests.py
…results Pre-existing bug surfaced by CodeRabbit: the ClickHouse variant of get_test_results passes elementary_tests_allowlist_status into _process_raw_test_results without ever initializing it. The default__ and fabric__ variants both set it from disable_passed_test_metrics at the top of the macro. Without this, sample selection for non-dbt_test rows (anomaly/schema_change) would silently skip on ClickHouse because the 'status in elementary_tests_allowlist_status' check would fail on an undefined value. Co-Authored-By: mika@elementary-data.com <mika.kerman@gmail.com>
Co-Authored-By: mika@elementary-data.com <mika.kerman@gmail.com>
Summary
Adds a new
skip_test_result_rowsparameter toelementary_cli.get_test_resultsand threads it fromReportAPI.get_report_datathroughTestsAPI→TestsFetcher→ the dbt macro. Whenskip_test_result_rows=true, the macro skips theget_result_rows_agatequery againsttest_result_rowsentirely, and theresult_rowscolumn onelementary_test_resultsis also omitted from the main query (no longer materialized into the intermediateordered_test_resultstable on ClickHouse either).Ticket: APP-1123
Why a new parameter instead of reusing
disable_samplesReportAPI.get_report_dataalready had adisable_samplesparameter, but it has a different concern than what we needed:disable_samples(existing, Python-level) — applied by_get_test_result_from_test_result_db_rowafter the SQL has already run, and only setssample_data = Nonefortest_type == "dbt_test"rows. This is the OSS--disable-samplesCLI behavior for PII protection: it masks dbt-test failed-row samples but preserves anomaly metrics and schema-change diffs.skip_test_result_rows(new, macro-level) — a pure DB optimization. Skips the expensivetest_result_rowstable query (~522 GB / 249.6M rows in cloud) and omits theresult_rowscolumn from the mainelementary_test_resultsselect. Affects all test types uniformly. The Python-level masking still applies on top of this for dbt-test rows whendisable_samplesis also set.These are two separate concerns and the cloud consumer wants the optimization unconditionally (regardless of the per-env PII flag), so a new parameter cleanly separates them.
Changes
elementary/monitor/dbt_project/macros/get_test_results.sql— addskip_test_result_rows = falseto the dispatch macro, thedefault__/fabric__/clickhouse__variants, and_process_raw_test_results. Wrap theget_result_rows_agatecall in{% if not skip_test_result_rows %}(empty dict otherwise —_process_raw_test_results.test_result_rows_agate.get(test.id)returnsNonefor missing keys). Also gate theget_test_rows_samplecall inside_process_raw_test_resultson the same flag so samples can't leak via the legacyelementary_test_results.result_rowsJSON column.elementary/monitor/dbt_project/macros/base_queries/current_tests_run_results_query.sql— acceptskip_test_result_rowsand conditionally selectnull as result_rowsinstead ofelementary_test_results.result_rows. This means theresult_rowscolumn is never materialized on Snowflake/BQ/Redshift/Postgres/Databricks and never inserted into ClickHouse's intermediateordered_test_resultstable either.elementary/monitor/fetchers/tests/tests.py— acceptskip_test_result_rowsand forward it inmacro_args.elementary/monitor/api/tests/tests.py— acceptskip_test_result_rowsinTestsAPI.__init__and_get_test_results_db_rows, forward to the fetcher. The existingdisable_samplesparameter onget_test_results/_get_test_result_from_test_result_db_rowis untouched.elementary/monitor/api/report/report.py— add newskip_test_result_rows: bool = Falseparameter, forward to theTestsAPIconstructor. The existingdisable_samplesparameter and its forwarding totests_api.get_test_results(disable_samples=...)is untouched.Drive-by fix
elementary_tests_allowlist_statusinclickhouse__get_test_results. Pre-existing bug surfaced by CodeRabbit: the ClickHouse variant called_process_raw_test_resultswith an uninitialized variable, which would silently skip sample selection for non-dbt_testrows on ClickHouse. Thedefault__andfabric__variants already set this fromdisable_passed_test_metrics.Backwards compatibility
false, so existing callers (OSS users,data_monitoring_report.send_test_results_summary) see no behavior change.disable_samplesparameter onReportAPI.get_report_dataandTestsAPI.get_test_resultsis preserved with its original Python-level mask semantics — OSSedr report --disable-samplescontinues to work exactly as before.Review & Testing Checklist for Human
default__,fabric__,clickhouse__) by inspectingget_test_results.sql.test_result_rows_agate = {}(empty dict) is safe input for_process_raw_test_results— it relies ontest_result_rows_agate.get(test.id)which returnsNonefor missing keys, and theget_test_rows_samplecall is now also gated onskip_test_result_rowsso it's never invoked when samples are skipped.skip_test_result_rows=true,current_tests_run_results_queryreturnsnullfor theresult_rowscolumn, so the inlineINSERT INTO ordered_test_resultsno longer materializes it.skip_test_result_rows=trueand confirm thetest_result_rowsSELECT no longer appears in query history.Notes
skip_test_result_rows=Trueinget_edr_reportwhile keepingdisable_samplesdriven by the per-envdisable_test_samplesflag.Link to Devin session: https://app.devin.ai/sessions/d3ae8f0bdf5b484b8ca0cde41ec58529
Requested by: @MikaKerman
Summary by CodeRabbit