[Enhancement](test) Add TPC-H SF10 MOR unique key regression tests#61762
Open
dataroaring wants to merge 2 commits intomasterfrom
Open
[Enhancement](test) Add TPC-H SF10 MOR unique key regression tests#61762dataroaring wants to merge 2 commits intomasterfrom
dataroaring wants to merge 2 commits intomasterfrom
Conversation
…ique tables Add TPC-H SF10 regression tests for Merge-On-Read unique key tables. The existing tpch_sf100_unique_p2 only tests MOW (Merge-On-Write), skipping MOR coverage entirely. This suite creates MOR tables (enable_unique_key_merge_on_write=false), loads data twice to create overlapping rowsets that trigger merge-on-read semantics, and includes: - test_read_mor_as_dup: validates read_mor_as_dup_tables by comparing MOR-as-DUP query results against actual DUPLICATE KEY tables - test_mor_value_predicate_pushdown: validates enable_mor_value_predicate_pushdown_tables with all 22 TPC-H queries Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
There was a problem hiding this comment.
Pull request overview
Adds a new TPC-H SF10 regression suite to cover Merge-On-Read (MOR) unique key behavior, including MOR-as-DUP equivalence and MOR value predicate pushdown validation at scale.
Changes:
- Add suite loader that creates MOR + DUP table variants and double-loads data to produce overlapping rowsets.
- Add
test_read_mor_as_dupto compare MOR-as-DUP query results vs actual DUP tables across all 22 TPC-H queries. - Add
test_mor_value_predicate_pushdownto run all 22 queries with MOR value predicate pushdown enabled.
Reviewed changes
Copilot reviewed 57 out of 57 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| regression-test/suites/tpch_sf10_unique_mor_p2/test_read_mor_as_dup.groovy | Compares MOR-as-DUP results vs DUP tables for 22 queries |
| regression-test/suites/tpch_sf10_unique_mor_p2/test_mor_value_predicate_pushdown.groovy | Runs 22 queries with MOR predicate pushdown enabled |
| regression-test/suites/tpch_sf10_unique_mor_p2/load.groovy | Creates tables, double-loads data, validates row counts, analyzes tables |
| regression-test/suites/tpch_sf10_unique_mor_p2/sql/q01.sql | TPC-H query 01 |
| regression-test/suites/tpch_sf10_unique_mor_p2/sql/q02.sql | TPC-H query 02 |
| regression-test/suites/tpch_sf10_unique_mor_p2/sql/q03.sql | TPC-H query 03 |
| regression-test/suites/tpch_sf10_unique_mor_p2/sql/q04.sql | TPC-H query 04 |
| regression-test/suites/tpch_sf10_unique_mor_p2/sql/q05.sql | TPC-H query 05 |
| regression-test/suites/tpch_sf10_unique_mor_p2/sql/q06.sql | TPC-H query 06 |
| regression-test/suites/tpch_sf10_unique_mor_p2/sql/q07.sql | TPC-H query 07 |
| regression-test/suites/tpch_sf10_unique_mor_p2/sql/q08.sql | TPC-H query 08 |
| regression-test/suites/tpch_sf10_unique_mor_p2/sql/q09.sql | TPC-H query 09 |
| regression-test/suites/tpch_sf10_unique_mor_p2/sql/q10.sql | TPC-H query 10 |
| regression-test/suites/tpch_sf10_unique_mor_p2/sql/q11.sql | TPC-H query 11 |
| regression-test/suites/tpch_sf10_unique_mor_p2/sql/q12.sql | TPC-H query 12 |
| regression-test/suites/tpch_sf10_unique_mor_p2/sql/q13.sql | TPC-H query 13 |
| regression-test/suites/tpch_sf10_unique_mor_p2/sql/q14.sql | TPC-H query 14 |
| regression-test/suites/tpch_sf10_unique_mor_p2/sql/q15.sql | TPC-H query 15 |
| regression-test/suites/tpch_sf10_unique_mor_p2/sql/q16.sql | TPC-H query 16 |
| regression-test/suites/tpch_sf10_unique_mor_p2/sql/q17.sql | TPC-H query 17 |
| regression-test/suites/tpch_sf10_unique_mor_p2/sql/q18.sql | TPC-H query 18 |
| regression-test/suites/tpch_sf10_unique_mor_p2/sql/q19.sql | TPC-H query 19 |
| regression-test/suites/tpch_sf10_unique_mor_p2/sql/q20.sql | TPC-H query 20 |
| regression-test/suites/tpch_sf10_unique_mor_p2/sql/q21.sql | TPC-H query 21 |
| regression-test/suites/tpch_sf10_unique_mor_p2/sql/q22.sql | TPC-H query 22 |
| regression-test/suites/tpch_sf10_unique_mor_p2/ddl/customer.sql | Create MOR customer table (unique key, MOW disabled) |
| regression-test/suites/tpch_sf10_unique_mor_p2/ddl/customer_dup.sql | Create DUP customer table |
| regression-test/suites/tpch_sf10_unique_mor_p2/ddl/customer_load.sql | Load customer into MOR table |
| regression-test/suites/tpch_sf10_unique_mor_p2/ddl/customer_dup_load.sql | Load customer into DUP table |
| regression-test/suites/tpch_sf10_unique_mor_p2/ddl/lineitem.sql | Create MOR lineitem table |
| regression-test/suites/tpch_sf10_unique_mor_p2/ddl/lineitem_dup.sql | Create DUP lineitem table |
| regression-test/suites/tpch_sf10_unique_mor_p2/ddl/lineitem_load.sql | Load lineitem into MOR table |
| regression-test/suites/tpch_sf10_unique_mor_p2/ddl/lineitem_dup_load.sql | Load lineitem into DUP table |
| regression-test/suites/tpch_sf10_unique_mor_p2/ddl/nation.sql | Create MOR nation table |
| regression-test/suites/tpch_sf10_unique_mor_p2/ddl/nation_dup.sql | Create DUP nation table |
| regression-test/suites/tpch_sf10_unique_mor_p2/ddl/nation_load.sql | Load nation into MOR table |
| regression-test/suites/tpch_sf10_unique_mor_p2/ddl/nation_dup_load.sql | Load nation into DUP table |
| regression-test/suites/tpch_sf10_unique_mor_p2/ddl/orders.sql | Create MOR orders table |
| regression-test/suites/tpch_sf10_unique_mor_p2/ddl/orders_dup.sql | Create DUP orders table |
| regression-test/suites/tpch_sf10_unique_mor_p2/ddl/orders_load.sql | Load orders into MOR table |
| regression-test/suites/tpch_sf10_unique_mor_p2/ddl/orders_dup_load.sql | Load orders into DUP table |
| regression-test/suites/tpch_sf10_unique_mor_p2/ddl/part.sql | Create MOR part table |
| regression-test/suites/tpch_sf10_unique_mor_p2/ddl/part_dup.sql | Create DUP part table |
| regression-test/suites/tpch_sf10_unique_mor_p2/ddl/part_load.sql | Load part into MOR table |
| regression-test/suites/tpch_sf10_unique_mor_p2/ddl/part_dup_load.sql | Load part into DUP table |
| regression-test/suites/tpch_sf10_unique_mor_p2/ddl/partsupp.sql | Create MOR partsupp table |
| regression-test/suites/tpch_sf10_unique_mor_p2/ddl/partsupp_dup.sql | Create DUP partsupp table |
| regression-test/suites/tpch_sf10_unique_mor_p2/ddl/partsupp_load.sql | Load partsupp into MOR table |
| regression-test/suites/tpch_sf10_unique_mor_p2/ddl/partsupp_dup_load.sql | Load partsupp into DUP table |
| regression-test/suites/tpch_sf10_unique_mor_p2/ddl/region.sql | Create MOR region table |
| regression-test/suites/tpch_sf10_unique_mor_p2/ddl/region_dup.sql | Create DUP region table |
| regression-test/suites/tpch_sf10_unique_mor_p2/ddl/region_load.sql | Load region into MOR table |
| regression-test/suites/tpch_sf10_unique_mor_p2/ddl/region_dup_load.sql | Load region into DUP table |
| regression-test/suites/tpch_sf10_unique_mor_p2/ddl/supplier.sql | Create MOR supplier table |
| regression-test/suites/tpch_sf10_unique_mor_p2/ddl/supplier_dup.sql | Create DUP supplier table |
| regression-test/suites/tpch_sf10_unique_mor_p2/ddl/supplier_load.sql | Load supplier into MOR table |
| regression-test/suites/tpch_sf10_unique_mor_p2/ddl/supplier_dup_load.sql | Load supplier into DUP table |
Comments suppressed due to low confidence (7)
regression-test/suites/tpch_sf10_unique_mor_p2/test_read_mor_as_dup.groovy:1
- This compares result lists for exact equality, which will fail for queries without a deterministic ORDER BY (row order can legitimately differ between MOR and DUP plans). To make this reliable, enforce deterministic ordering (e.g., wrap each query in an outer SELECT with an ORDER BY on all output columns / a stable key), or normalize before comparison (e.g., sort both result sets consistently) so the assertion checks content rather than row ordering.
regression-test/suites/tpch_sf10_unique_mor_p2/test_read_mor_as_dup.groovy:1 listFiles()can return null (e.g., if the directory does not exist or is not readable), which will throw a NullPointerException onfindAll. Add an explicit existence/readability check forsqlDirand a clearer error (or fallback) when no SQL files are found.
regression-test/suites/tpch_sf10_unique_mor_p2/load.groovy:1show load where Label = '${loadLabel}'may temporarily return an empty result right after issuing the load (or due to FE visibility delays). In that case,stateResult[stateResult.size() - 1]will throw. Handle the empty result by sleeping/retrying until at least one row is returned, and also add a max-wait timeout to avoid an infinite loop if the load gets stuck in an intermediate state.
regression-test/suites/tpch_sf10_unique_mor_p2/load.groovy:1- This changes a global FE config for the cluster and is not reverted at the end of the suite, which can leak into other tests and affect stability. Prefer scoping this change (restore the previous value in a
finallyblock), or avoid global mutation if there is a session-level alternative in this framework.
regression-test/suites/tpch_sf10_unique_mor_p2/load.groovy:1 - A fixed sleep is prone to flakiness (too short on slow environments, unnecessarily long on fast ones). Replace this with a polling wait that checks the condition you actually need (e.g., row count visibility / stats readiness), with an overall timeout to keep the suite bounded.
regression-test/suites/tpch_sf10_unique_mor_p2/sql/q20.sql:1 - This query uses
date('1994-01-01'), while the rest of the suite consistently uses the standard SQL literal formDATE 'YYYY-MM-DD'. Ifdate(<string>)is not supported (or behaves differently) in the target engine, q20 will fail or produce inconsistent results. Align this toDATE '1994-01-01'(and keep interval syntax consistent with other queries in the suite).
regression-test/suites/tpch_sf10_unique_mor_p2/test_mor_value_predicate_pushdown.groovy:1 - The suite enables a session setting but never resets it. If the test runner reuses the same session across suites, this can leak into subsequent tests. Consider resetting
enable_mor_value_predicate_pushdown_tablesback to''(ideally in afinally/cleanup block) once the loop completes.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
… results Update test_mor_value_predicate_pushdown to run each TPC-H query twice: once without pushdown (standard MOR merge = MOW equivalent) and once with pushdown enabled, then assertEquals to verify identical results. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Contributor
Author
|
run buildall |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
tpch_sf10_unique_mor_p2test suite for Merge-On-Read (MOR) unique key tables, filling the MOR coverage gap in existingtpch_sf100_unique_p2which only tests MOWenable_unique_key_merge_on_write=false) and loads data twice to create overlapping rowsets that trigger merge-on-read dedup semanticstest_read_mor_as_dup: runs all 22 TPC-H queries withread_mor_as_dup_tables='*'and compares results against actual DUPLICATE KEY tables to verify behavioral equivalencetest_mor_value_predicate_pushdown: runs all 22 TPC-H queries withenable_mor_value_predicate_pushdown_tables='*'to validate value predicate pushdown on MOR tables at scaleTest plan
tpch_sf10_unique_mor_p2/loadto verify double-load creates correct row counts (MOR dedup = 1x, DUP = 2x)test_read_mor_as_dupto verify MOR-as-DUP results match actual DUP table results for all 22 TPC-H queriestest_mor_value_predicate_pushdownto verify predicate pushdown produces correct results on all 22 TPC-H queries🤖 Generated with Claude Code