Skip to content

[Enhancement](test) Add TPC-H SF10 MOR unique key regression tests#61762

Open
dataroaring wants to merge 2 commits intomasterfrom
feature/tpch_sf10_unique_mor_p2
Open

[Enhancement](test) Add TPC-H SF10 MOR unique key regression tests#61762
dataroaring wants to merge 2 commits intomasterfrom
feature/tpch_sf10_unique_mor_p2

Conversation

@dataroaring
Copy link
Contributor

Summary

  • Add tpch_sf10_unique_mor_p2 test suite for Merge-On-Read (MOR) unique key tables, filling the MOR coverage gap in existing tpch_sf100_unique_p2 which only tests MOW
  • Creates MOR tables (enable_unique_key_merge_on_write=false) and loads data twice to create overlapping rowsets that trigger merge-on-read dedup semantics
  • test_read_mor_as_dup: runs all 22 TPC-H queries with read_mor_as_dup_tables='*' and compares results against actual DUPLICATE KEY tables to verify behavioral equivalence
  • test_mor_value_predicate_pushdown: runs all 22 TPC-H queries with enable_mor_value_predicate_pushdown_tables='*' to validate value predicate pushdown on MOR tables at scale

Test plan

  • Run tpch_sf10_unique_mor_p2/load to verify double-load creates correct row counts (MOR dedup = 1x, DUP = 2x)
  • Run test_read_mor_as_dup to verify MOR-as-DUP results match actual DUP table results for all 22 TPC-H queries
  • Run test_mor_value_predicate_pushdown to verify predicate pushdown produces correct results on all 22 TPC-H queries

🤖 Generated with Claude Code

…ique tables

Add TPC-H SF10 regression tests for Merge-On-Read unique key tables.
The existing tpch_sf100_unique_p2 only tests MOW (Merge-On-Write),
skipping MOR coverage entirely. This suite creates MOR tables
(enable_unique_key_merge_on_write=false), loads data twice to create
overlapping rowsets that trigger merge-on-read semantics, and includes:

- test_read_mor_as_dup: validates read_mor_as_dup_tables by comparing
  MOR-as-DUP query results against actual DUPLICATE KEY tables
- test_mor_value_predicate_pushdown: validates
  enable_mor_value_predicate_pushdown_tables with all 22 TPC-H queries

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 26, 2026 07:56
@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new TPC-H SF10 regression suite to cover Merge-On-Read (MOR) unique key behavior, including MOR-as-DUP equivalence and MOR value predicate pushdown validation at scale.

Changes:

  • Add suite loader that creates MOR + DUP table variants and double-loads data to produce overlapping rowsets.
  • Add test_read_mor_as_dup to compare MOR-as-DUP query results vs actual DUP tables across all 22 TPC-H queries.
  • Add test_mor_value_predicate_pushdown to run all 22 queries with MOR value predicate pushdown enabled.

Reviewed changes

Copilot reviewed 57 out of 57 changed files in this pull request and generated no comments.

Show a summary per file
File Description
regression-test/suites/tpch_sf10_unique_mor_p2/test_read_mor_as_dup.groovy Compares MOR-as-DUP results vs DUP tables for 22 queries
regression-test/suites/tpch_sf10_unique_mor_p2/test_mor_value_predicate_pushdown.groovy Runs 22 queries with MOR predicate pushdown enabled
regression-test/suites/tpch_sf10_unique_mor_p2/load.groovy Creates tables, double-loads data, validates row counts, analyzes tables
regression-test/suites/tpch_sf10_unique_mor_p2/sql/q01.sql TPC-H query 01
regression-test/suites/tpch_sf10_unique_mor_p2/sql/q02.sql TPC-H query 02
regression-test/suites/tpch_sf10_unique_mor_p2/sql/q03.sql TPC-H query 03
regression-test/suites/tpch_sf10_unique_mor_p2/sql/q04.sql TPC-H query 04
regression-test/suites/tpch_sf10_unique_mor_p2/sql/q05.sql TPC-H query 05
regression-test/suites/tpch_sf10_unique_mor_p2/sql/q06.sql TPC-H query 06
regression-test/suites/tpch_sf10_unique_mor_p2/sql/q07.sql TPC-H query 07
regression-test/suites/tpch_sf10_unique_mor_p2/sql/q08.sql TPC-H query 08
regression-test/suites/tpch_sf10_unique_mor_p2/sql/q09.sql TPC-H query 09
regression-test/suites/tpch_sf10_unique_mor_p2/sql/q10.sql TPC-H query 10
regression-test/suites/tpch_sf10_unique_mor_p2/sql/q11.sql TPC-H query 11
regression-test/suites/tpch_sf10_unique_mor_p2/sql/q12.sql TPC-H query 12
regression-test/suites/tpch_sf10_unique_mor_p2/sql/q13.sql TPC-H query 13
regression-test/suites/tpch_sf10_unique_mor_p2/sql/q14.sql TPC-H query 14
regression-test/suites/tpch_sf10_unique_mor_p2/sql/q15.sql TPC-H query 15
regression-test/suites/tpch_sf10_unique_mor_p2/sql/q16.sql TPC-H query 16
regression-test/suites/tpch_sf10_unique_mor_p2/sql/q17.sql TPC-H query 17
regression-test/suites/tpch_sf10_unique_mor_p2/sql/q18.sql TPC-H query 18
regression-test/suites/tpch_sf10_unique_mor_p2/sql/q19.sql TPC-H query 19
regression-test/suites/tpch_sf10_unique_mor_p2/sql/q20.sql TPC-H query 20
regression-test/suites/tpch_sf10_unique_mor_p2/sql/q21.sql TPC-H query 21
regression-test/suites/tpch_sf10_unique_mor_p2/sql/q22.sql TPC-H query 22
regression-test/suites/tpch_sf10_unique_mor_p2/ddl/customer.sql Create MOR customer table (unique key, MOW disabled)
regression-test/suites/tpch_sf10_unique_mor_p2/ddl/customer_dup.sql Create DUP customer table
regression-test/suites/tpch_sf10_unique_mor_p2/ddl/customer_load.sql Load customer into MOR table
regression-test/suites/tpch_sf10_unique_mor_p2/ddl/customer_dup_load.sql Load customer into DUP table
regression-test/suites/tpch_sf10_unique_mor_p2/ddl/lineitem.sql Create MOR lineitem table
regression-test/suites/tpch_sf10_unique_mor_p2/ddl/lineitem_dup.sql Create DUP lineitem table
regression-test/suites/tpch_sf10_unique_mor_p2/ddl/lineitem_load.sql Load lineitem into MOR table
regression-test/suites/tpch_sf10_unique_mor_p2/ddl/lineitem_dup_load.sql Load lineitem into DUP table
regression-test/suites/tpch_sf10_unique_mor_p2/ddl/nation.sql Create MOR nation table
regression-test/suites/tpch_sf10_unique_mor_p2/ddl/nation_dup.sql Create DUP nation table
regression-test/suites/tpch_sf10_unique_mor_p2/ddl/nation_load.sql Load nation into MOR table
regression-test/suites/tpch_sf10_unique_mor_p2/ddl/nation_dup_load.sql Load nation into DUP table
regression-test/suites/tpch_sf10_unique_mor_p2/ddl/orders.sql Create MOR orders table
regression-test/suites/tpch_sf10_unique_mor_p2/ddl/orders_dup.sql Create DUP orders table
regression-test/suites/tpch_sf10_unique_mor_p2/ddl/orders_load.sql Load orders into MOR table
regression-test/suites/tpch_sf10_unique_mor_p2/ddl/orders_dup_load.sql Load orders into DUP table
regression-test/suites/tpch_sf10_unique_mor_p2/ddl/part.sql Create MOR part table
regression-test/suites/tpch_sf10_unique_mor_p2/ddl/part_dup.sql Create DUP part table
regression-test/suites/tpch_sf10_unique_mor_p2/ddl/part_load.sql Load part into MOR table
regression-test/suites/tpch_sf10_unique_mor_p2/ddl/part_dup_load.sql Load part into DUP table
regression-test/suites/tpch_sf10_unique_mor_p2/ddl/partsupp.sql Create MOR partsupp table
regression-test/suites/tpch_sf10_unique_mor_p2/ddl/partsupp_dup.sql Create DUP partsupp table
regression-test/suites/tpch_sf10_unique_mor_p2/ddl/partsupp_load.sql Load partsupp into MOR table
regression-test/suites/tpch_sf10_unique_mor_p2/ddl/partsupp_dup_load.sql Load partsupp into DUP table
regression-test/suites/tpch_sf10_unique_mor_p2/ddl/region.sql Create MOR region table
regression-test/suites/tpch_sf10_unique_mor_p2/ddl/region_dup.sql Create DUP region table
regression-test/suites/tpch_sf10_unique_mor_p2/ddl/region_load.sql Load region into MOR table
regression-test/suites/tpch_sf10_unique_mor_p2/ddl/region_dup_load.sql Load region into DUP table
regression-test/suites/tpch_sf10_unique_mor_p2/ddl/supplier.sql Create MOR supplier table
regression-test/suites/tpch_sf10_unique_mor_p2/ddl/supplier_dup.sql Create DUP supplier table
regression-test/suites/tpch_sf10_unique_mor_p2/ddl/supplier_load.sql Load supplier into MOR table
regression-test/suites/tpch_sf10_unique_mor_p2/ddl/supplier_dup_load.sql Load supplier into DUP table
Comments suppressed due to low confidence (7)

regression-test/suites/tpch_sf10_unique_mor_p2/test_read_mor_as_dup.groovy:1

  • This compares result lists for exact equality, which will fail for queries without a deterministic ORDER BY (row order can legitimately differ between MOR and DUP plans). To make this reliable, enforce deterministic ordering (e.g., wrap each query in an outer SELECT with an ORDER BY on all output columns / a stable key), or normalize before comparison (e.g., sort both result sets consistently) so the assertion checks content rather than row ordering.
    regression-test/suites/tpch_sf10_unique_mor_p2/test_read_mor_as_dup.groovy:1
  • listFiles() can return null (e.g., if the directory does not exist or is not readable), which will throw a NullPointerException on findAll. Add an explicit existence/readability check for sqlDir and a clearer error (or fallback) when no SQL files are found.
    regression-test/suites/tpch_sf10_unique_mor_p2/load.groovy:1
  • show load where Label = '${loadLabel}' may temporarily return an empty result right after issuing the load (or due to FE visibility delays). In that case, stateResult[stateResult.size() - 1] will throw. Handle the empty result by sleeping/retrying until at least one row is returned, and also add a max-wait timeout to avoid an infinite loop if the load gets stuck in an intermediate state.
    regression-test/suites/tpch_sf10_unique_mor_p2/load.groovy:1
  • This changes a global FE config for the cluster and is not reverted at the end of the suite, which can leak into other tests and affect stability. Prefer scoping this change (restore the previous value in a finally block), or avoid global mutation if there is a session-level alternative in this framework.
    regression-test/suites/tpch_sf10_unique_mor_p2/load.groovy:1
  • A fixed sleep is prone to flakiness (too short on slow environments, unnecessarily long on fast ones). Replace this with a polling wait that checks the condition you actually need (e.g., row count visibility / stats readiness), with an overall timeout to keep the suite bounded.
    regression-test/suites/tpch_sf10_unique_mor_p2/sql/q20.sql:1
  • This query uses date('1994-01-01'), while the rest of the suite consistently uses the standard SQL literal form DATE 'YYYY-MM-DD'. If date(<string>) is not supported (or behaves differently) in the target engine, q20 will fail or produce inconsistent results. Align this to DATE '1994-01-01' (and keep interval syntax consistent with other queries in the suite).
    regression-test/suites/tpch_sf10_unique_mor_p2/test_mor_value_predicate_pushdown.groovy:1
  • The suite enables a session setting but never resets it. If the test runner reuses the same session across suites, this can leak into subsequent tests. Consider resetting enable_mor_value_predicate_pushdown_tables back to '' (ideally in a finally/cleanup block) once the loop completes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

… results

Update test_mor_value_predicate_pushdown to run each TPC-H query twice:
once without pushdown (standard MOR merge = MOW equivalent) and once
with pushdown enabled, then assertEquals to verify identical results.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@dataroaring
Copy link
Contributor Author

run buildall

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants