[Enhancement](test) Add TPC-H SF10 MOR unique key regression tests by dataroaring · Pull Request #61762 · apache/doris

dataroaring · 2026-03-26T07:56:20Z

Summary

Add tpch_sf10_unique_mor_p2 test suite for Merge-On-Read (MOR) unique key tables, filling the MOR coverage gap in existing tpch_sf100_unique_p2 which only tests MOW
Creates MOR tables (enable_unique_key_merge_on_write=false) and loads data twice to create overlapping rowsets that trigger merge-on-read dedup semantics
test_read_mor_as_dup: runs all 22 TPC-H queries with read_mor_as_dup_tables='*' and compares results against actual DUPLICATE KEY tables to verify behavioral equivalence
test_mor_value_predicate_pushdown: runs all 22 TPC-H queries with enable_mor_value_predicate_pushdown_tables='*' to validate value predicate pushdown on MOR tables at scale

Test plan

Run tpch_sf10_unique_mor_p2/load to verify double-load creates correct row counts (MOR dedup = 1x, DUP = 2x)
Run test_read_mor_as_dup to verify MOR-as-DUP results match actual DUP table results for all 22 TPC-H queries
Run test_mor_value_predicate_pushdown to verify predicate pushdown produces correct results on all 22 TPC-H queries

🤖 Generated with Claude Code

…ique tables Add TPC-H SF10 regression tests for Merge-On-Read unique key tables. The existing tpch_sf100_unique_p2 only tests MOW (Merge-On-Write), skipping MOR coverage entirely. This suite creates MOR tables (enable_unique_key_merge_on_write=false), loads data twice to create overlapping rowsets that trigger merge-on-read semantics, and includes: - test_read_mor_as_dup: validates read_mor_as_dup_tables by comparing MOR-as-DUP query results against actual DUPLICATE KEY tables - test_mor_value_predicate_pushdown: validates enable_mor_value_predicate_pushdown_tables with all 22 TPC-H queries Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

hello-stephen · 2026-03-26T07:56:26Z

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

What problem was fixed (it's best to include specific error reporting information). How it was fixed.
Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
What features were added. Why was this function added?
Which code was refactored and why was this part of the code refactored?
Which functions were optimized and what is the difference before and after the optimization?

Copilot

Pull request overview

Adds a new TPC-H SF10 regression suite to cover Merge-On-Read (MOR) unique key behavior, including MOR-as-DUP equivalence and MOR value predicate pushdown validation at scale.

Changes:

Add suite loader that creates MOR + DUP table variants and double-loads data to produce overlapping rowsets.
Add test_read_mor_as_dup to compare MOR-as-DUP query results vs actual DUP tables across all 22 TPC-H queries.
Add test_mor_value_predicate_pushdown to run all 22 queries with MOR value predicate pushdown enabled.

Reviewed changes

Copilot reviewed 57 out of 57 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
regression-test/suites/tpch_sf10_unique_mor_p2/test_read_mor_as_dup.groovy	Compares MOR-as-DUP results vs DUP tables for 22 queries
regression-test/suites/tpch_sf10_unique_mor_p2/test_mor_value_predicate_pushdown.groovy	Runs 22 queries with MOR predicate pushdown enabled
regression-test/suites/tpch_sf10_unique_mor_p2/load.groovy	Creates tables, double-loads data, validates row counts, analyzes tables
regression-test/suites/tpch_sf10_unique_mor_p2/sql/q01.sql	TPC-H query 01
regression-test/suites/tpch_sf10_unique_mor_p2/sql/q02.sql	TPC-H query 02
regression-test/suites/tpch_sf10_unique_mor_p2/sql/q03.sql	TPC-H query 03
regression-test/suites/tpch_sf10_unique_mor_p2/sql/q04.sql	TPC-H query 04
regression-test/suites/tpch_sf10_unique_mor_p2/sql/q05.sql	TPC-H query 05
regression-test/suites/tpch_sf10_unique_mor_p2/sql/q06.sql	TPC-H query 06
regression-test/suites/tpch_sf10_unique_mor_p2/sql/q07.sql	TPC-H query 07
regression-test/suites/tpch_sf10_unique_mor_p2/sql/q08.sql	TPC-H query 08
regression-test/suites/tpch_sf10_unique_mor_p2/sql/q09.sql	TPC-H query 09
regression-test/suites/tpch_sf10_unique_mor_p2/sql/q10.sql	TPC-H query 10
regression-test/suites/tpch_sf10_unique_mor_p2/sql/q11.sql	TPC-H query 11
regression-test/suites/tpch_sf10_unique_mor_p2/sql/q12.sql	TPC-H query 12
regression-test/suites/tpch_sf10_unique_mor_p2/sql/q13.sql	TPC-H query 13
regression-test/suites/tpch_sf10_unique_mor_p2/sql/q14.sql	TPC-H query 14
regression-test/suites/tpch_sf10_unique_mor_p2/sql/q15.sql	TPC-H query 15
regression-test/suites/tpch_sf10_unique_mor_p2/sql/q16.sql	TPC-H query 16
regression-test/suites/tpch_sf10_unique_mor_p2/sql/q17.sql	TPC-H query 17
regression-test/suites/tpch_sf10_unique_mor_p2/sql/q18.sql	TPC-H query 18
regression-test/suites/tpch_sf10_unique_mor_p2/sql/q19.sql	TPC-H query 19
regression-test/suites/tpch_sf10_unique_mor_p2/sql/q20.sql	TPC-H query 20
regression-test/suites/tpch_sf10_unique_mor_p2/sql/q21.sql	TPC-H query 21
regression-test/suites/tpch_sf10_unique_mor_p2/sql/q22.sql	TPC-H query 22
regression-test/suites/tpch_sf10_unique_mor_p2/ddl/customer.sql	Create MOR customer table (unique key, MOW disabled)
regression-test/suites/tpch_sf10_unique_mor_p2/ddl/customer_dup.sql	Create DUP customer table
regression-test/suites/tpch_sf10_unique_mor_p2/ddl/customer_load.sql	Load customer into MOR table
regression-test/suites/tpch_sf10_unique_mor_p2/ddl/customer_dup_load.sql	Load customer into DUP table
regression-test/suites/tpch_sf10_unique_mor_p2/ddl/lineitem.sql	Create MOR lineitem table
regression-test/suites/tpch_sf10_unique_mor_p2/ddl/lineitem_dup.sql	Create DUP lineitem table
regression-test/suites/tpch_sf10_unique_mor_p2/ddl/lineitem_load.sql	Load lineitem into MOR table
regression-test/suites/tpch_sf10_unique_mor_p2/ddl/lineitem_dup_load.sql	Load lineitem into DUP table
regression-test/suites/tpch_sf10_unique_mor_p2/ddl/nation.sql	Create MOR nation table
regression-test/suites/tpch_sf10_unique_mor_p2/ddl/nation_dup.sql	Create DUP nation table
regression-test/suites/tpch_sf10_unique_mor_p2/ddl/nation_load.sql	Load nation into MOR table
regression-test/suites/tpch_sf10_unique_mor_p2/ddl/nation_dup_load.sql	Load nation into DUP table
regression-test/suites/tpch_sf10_unique_mor_p2/ddl/orders.sql	Create MOR orders table
regression-test/suites/tpch_sf10_unique_mor_p2/ddl/orders_dup.sql	Create DUP orders table
regression-test/suites/tpch_sf10_unique_mor_p2/ddl/orders_load.sql	Load orders into MOR table
regression-test/suites/tpch_sf10_unique_mor_p2/ddl/orders_dup_load.sql	Load orders into DUP table
regression-test/suites/tpch_sf10_unique_mor_p2/ddl/part.sql	Create MOR part table
regression-test/suites/tpch_sf10_unique_mor_p2/ddl/part_dup.sql	Create DUP part table
regression-test/suites/tpch_sf10_unique_mor_p2/ddl/part_load.sql	Load part into MOR table
regression-test/suites/tpch_sf10_unique_mor_p2/ddl/part_dup_load.sql	Load part into DUP table
regression-test/suites/tpch_sf10_unique_mor_p2/ddl/partsupp.sql	Create MOR partsupp table
regression-test/suites/tpch_sf10_unique_mor_p2/ddl/partsupp_dup.sql	Create DUP partsupp table
regression-test/suites/tpch_sf10_unique_mor_p2/ddl/partsupp_load.sql	Load partsupp into MOR table
regression-test/suites/tpch_sf10_unique_mor_p2/ddl/partsupp_dup_load.sql	Load partsupp into DUP table
regression-test/suites/tpch_sf10_unique_mor_p2/ddl/region.sql	Create MOR region table
regression-test/suites/tpch_sf10_unique_mor_p2/ddl/region_dup.sql	Create DUP region table
regression-test/suites/tpch_sf10_unique_mor_p2/ddl/region_load.sql	Load region into MOR table
regression-test/suites/tpch_sf10_unique_mor_p2/ddl/region_dup_load.sql	Load region into DUP table
regression-test/suites/tpch_sf10_unique_mor_p2/ddl/supplier.sql	Create MOR supplier table
regression-test/suites/tpch_sf10_unique_mor_p2/ddl/supplier_dup.sql	Create DUP supplier table
regression-test/suites/tpch_sf10_unique_mor_p2/ddl/supplier_load.sql	Load supplier into MOR table
regression-test/suites/tpch_sf10_unique_mor_p2/ddl/supplier_dup_load.sql	Load supplier into DUP table

Comments suppressed due to low confidence (7)

regression-test/suites/tpch_sf10_unique_mor_p2/test_read_mor_as_dup.groovy:1

This compares result lists for exact equality, which will fail for queries without a deterministic ORDER BY (row order can legitimately differ between MOR and DUP plans). To make this reliable, enforce deterministic ordering (e.g., wrap each query in an outer SELECT with an ORDER BY on all output columns / a stable key), or normalize before comparison (e.g., sort both result sets consistently) so the assertion checks content rather than row ordering.
regression-test/suites/tpch_sf10_unique_mor_p2/test_read_mor_as_dup.groovy:1
listFiles() can return null (e.g., if the directory does not exist or is not readable), which will throw a NullPointerException on findAll. Add an explicit existence/readability check for sqlDir and a clearer error (or fallback) when no SQL files are found.
regression-test/suites/tpch_sf10_unique_mor_p2/load.groovy:1
show load where Label = '${loadLabel}' may temporarily return an empty result right after issuing the load (or due to FE visibility delays). In that case, stateResult[stateResult.size() - 1] will throw. Handle the empty result by sleeping/retrying until at least one row is returned, and also add a max-wait timeout to avoid an infinite loop if the load gets stuck in an intermediate state.
regression-test/suites/tpch_sf10_unique_mor_p2/load.groovy:1
This changes a global FE config for the cluster and is not reverted at the end of the suite, which can leak into other tests and affect stability. Prefer scoping this change (restore the previous value in a finally block), or avoid global mutation if there is a session-level alternative in this framework.
regression-test/suites/tpch_sf10_unique_mor_p2/load.groovy:1
A fixed sleep is prone to flakiness (too short on slow environments, unnecessarily long on fast ones). Replace this with a polling wait that checks the condition you actually need (e.g., row count visibility / stats readiness), with an overall timeout to keep the suite bounded.
regression-test/suites/tpch_sf10_unique_mor_p2/sql/q20.sql:1
This query uses date('1994-01-01'), while the rest of the suite consistently uses the standard SQL literal form DATE 'YYYY-MM-DD'. If date(<string>) is not supported (or behaves differently) in the target engine, q20 will fail or produce inconsistent results. Align this to DATE '1994-01-01' (and keep interval syntax consistent with other queries in the suite).
regression-test/suites/tpch_sf10_unique_mor_p2/test_mor_value_predicate_pushdown.groovy:1
The suite enables a session setting but never resets it. If the test runner reuses the same session across suites, this can leak into subsequent tests. Consider resetting enable_mor_value_predicate_pushdown_tables back to '' (ideally in a finally/cleanup block) once the loop completes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

… results Update test_mor_value_predicate_pushdown to run each TPC-H query twice: once without pushdown (standard MOR merge = MOW equivalent) and once with pushdown enabled, then assertEquals to verify identical results. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

dataroaring · 2026-03-26T14:17:36Z

run buildall

Copilot AI review requested due to automatic review settings March 26, 2026 07:56

Copilot AI reviewed Mar 26, 2026

View reviewed changes

Copilot started reviewing on behalf of dataroaring March 26, 2026 08:09 View session

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhancement](test) Add TPC-H SF10 MOR unique key regression tests#61762

[Enhancement](test) Add TPC-H SF10 MOR unique key regression tests#61762
dataroaring wants to merge 2 commits intomasterfrom
feature/tpch_sf10_unique_mor_p2

dataroaring commented Mar 26, 2026

Uh oh!

hello-stephen commented Mar 26, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

dataroaring commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

dataroaring commented Mar 26, 2026

Summary

Test plan

Uh oh!

hello-stephen commented Mar 26, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

dataroaring commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants