[Fix](MergeIterator) Use actual block column count in VMergeIteratorContext::copy_rows by uchenily · Pull Request #60363 · apache/doris

uchenily · 2026-01-29T10:26:16Z

This PR ensures that VMergeIteratorContext::copy_rows iterates over all columns present in the input block by using block->columns() instead of a unsafe _num_columns value. This fix prevents column count mismatches when the read schema is changed. The data copying logic remains synchronized with the actual structure of the block at runtime, regardless of whether the schema has been expanded for delete predicates.

Consider the following table:

CREATE TABLE tbl (
  k INT NOT NULL,
  v1 INT NOT NULL,
  v2 INT NOT NULL
) DUPLICATE KEY(k) ...;

And a delete predicate applied to a non-key column:

DELETE FROM tbl WHERE v1 = 1;

When executing ORDER BY k LIMIT n, Doris has a Top-N optimization. Even though the query is SELECT *, the engine initially avoids scanning all columns. It constructs a minimal intermediate schema containing only the sort keys (k) and the internal __DORIS_ROWID_COL__ to perform the merge and sorting efficiently. (_col_ids = {0, 3}, ==> _num_columns = 2). However, because a delete predicate exists on column v1, the BetaRowsetReader add v1 to this intermediate schema to evaluate and filter out deleted rows during the scan. (_col_ids = {0, 3, 1}, note that column v1 (index=1) is appended to this schema ==> _num_columns = 3)

The previous implementation of VMergeIteratorContext::copy_rows used the incorrect _num_columns value, resulting in an array out-of-bounds access and causing BE coredumped.

Detailed reproduction steps are follows:

modify conf/be.conf

write_buffer_size = 8

execute the following sql

CREATE TABLE tbl1
(
    k INT NOT NULL,
    v1 INT NOT NULL,
    v2 INT NOT NULL
)
DUPLICATE KEY(k)
DISTRIBUTED BY HASH(k) BUCKETS 5
PROPERTIES(
    "replication_num" = "1"

);
CREATE TABLE tbl2
(
    k INT NOT NULL,
    v1 INT NOT NULL,
    v2 INT NOT NULL
)
DUPLICATE KEY(k)
DISTRIBUTED BY HASH(k) BUCKETS 1
PROPERTIES(
    "replication_num" = "1"
);

INSERT INTO tbl1 VALUES (1, 1, 1),(2, 2, 2),(3, 3, 3),(4, 4, 4),(5, 5, 5);
INSERT INTO tbl2 SELECT * FROM tbl1;
SELECT * FROM tbl2 ORDER BY k limit 100; -- ok

DELETE FROM tbl2 WHERE v1 = 100;
SELECT * FROM tbl2 ORDER BY k limit 100; -- coredump

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

Test
- Regression test
- Unit Test
- Manual test (add detailed scripts or steps below)
- No need to test or manual test. Explain why:
  - This is a refactor/code format and no logic has been changed.
  - Previous test can cover this change.
  - No code files have been changed.
  - Other reason
Behavior changed:
- No.
- Yes.
Does this need documentation?
- No.
- Yes.

Check List (For Reviewer who merge this PR)

Confirm the release note
Confirm test cases
Confirm document
Add branch pick label

…ontext::copy_rows This PR ensures that VMergeIteratorContext::copy_rows iterates over all columns present in the input block by using block->columns() instead of a unsafe _num_columns value. This fix prevents column count mismatches when the read schema is changed. The data copying logic remains synchronized with the actual structure of the block at runtime, regardless of whether the schema has been expanded for delete predicates. Consider the following table: ```sql CREATE TABLE tbl ( k INT NOT NULL, v1 INT NOT NULL, v2 INT NOT NULL ) DUPLICATE KEY(k) ...; ``` And a delete predicate applied to a non-key column: ```sql DELETE FROM tbl WHERE v1 = 1; ``` When executing ORDER BY k LIMIT n, Doris has a Top-N optimization. Even though the query is SELECT *, the engine initially avoids scanning all columns. It constructs a minimal intermediate schema containing only the sort keys (k) and the internal `__DORIS_ROWID_COL__` to perform the merge and sorting efficiently. (_col_ids = {0, 3}, ==> _num_columns = 2). However, because a delete predicate exists on column v1, the BetaRowsetReader add v1 to this intermediate schema to evaluate and filter out deleted rows during the scan. (_col_ids = {0, 3, 1}, note that column v1 (index=1) is appended to this schema ==> _num_columns = 3) The previous implementation of VMergeIteratorContext::copy_rows used the incorrect _num_columns value, resulting in an array out-of-bounds access and causing BE coredumped. Detailed reproduction steps are follows: 1. modify conf/be.conf ``` write_buffer_size = 8 ``` 2. run sql ```sql CREATE TABLE tbl1 ( k INT NOT NULL, v1 INT NOT NULL, v2 INT NOT NULL ) DUPLICATE KEY(k) DISTRIBUTED BY HASH(k) BUCKETS 5 PROPERTIES( "replication_num" = "1" ); CREATE TABLE tbl2 ( k INT NOT NULL, v1 INT NOT NULL, v2 INT NOT NULL ) DUPLICATE KEY(k) DISTRIBUTED BY HASH(k) BUCKETS 1 PROPERTIES( "replication_num" = "1" ); INSERT INTO tbl1 VALUES (1, 1, 1),(2, 2, 2),(3, 3, 3),(4, 4, 4),(5, 5, 5); INSERT INTO tbl2 SELECT * FROM tbl1; SELECT * FROM tbl2 ORDER BY k limit 100; -- ok DELETE FROM tbl2 WHERE v1 = 100; SELECT * FROM tbl2 ORDER BY k limit 100; -- coredump ```

Thearas · 2026-01-29T10:26:23Z

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

What problem was fixed (it's best to include specific error reporting information). How it was fixed.
Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
What features were added. Why was this function added?
Which code was refactored and why was this part of the code refactored?
Which functions were optimized and what is the difference before and after the optimization?

uchenily · 2026-01-29T10:26:48Z

run buildall

doris-robot · 2026-01-29T11:52:41Z

BE UT Coverage Report

Increment line coverage 100.00% (1/1) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	52.72% (19264/36540)
Line Coverage	36.11% (179029/495758)
Region Coverage	32.55% (138771/426373)
Branch Coverage	33.49% (60057/179337)

doris-robot · 2026-01-29T15:24:58Z

TPC-H: Total hot run time: 31878 ms

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 5b8202ed8a3e70ef7e82e16c02b8a08bb30b9bd6, data reload: false

------ Round 1 ----------------------------------
q1	17625	5310	5087	5087
q2	2008	309	197	197
q3	10215	1337	764	764
q4	10181	786	316	316
q5	7494	2217	1897	1897
q6	201	183	152	152
q7	884	755	610	610
q8	9273	1441	1092	1092
q9	5364	4805	4955	4805
q10	6837	1960	1570	1570
q11	504	303	289	289
q12	345	386	232	232
q13	17787	4060	3316	3316
q14	245	257	223	223
q15	945	855	845	845
q16	694	699	654	654
q17	667	914	501	501
q18	7028	6510	6274	6274
q19	1254	995	638	638
q20	405	352	243	243
q21	2779	2179	1898	1898
q22	359	314	275	275
Total cold run time: 103094 ms
Total hot run time: 31878 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5336	5330	5295	5295
q2	268	346	258	258
q3	2226	2736	2291	2291
q4	1362	1756	1333	1333
q5	4200	4139	4217	4139
q6	228	189	142	142
q7	2475	2157	1831	1831
q8	2572	2442	2456	2442
q9	7644	7481	7616	7481
q10	2969	3137	2691	2691
q11	540	488	465	465
q12	698	726	642	642
q13	3868	4495	3554	3554
q14	316	312	309	309
q15	915	821	884	821
q16	702	758	739	739
q17	1248	1374	1344	1344
q18	8177	8122	8066	8066
q19	966	903	873	873
q20	2108	2234	1934	1934
q21	4584	4205	4128	4128
q22	588	544	503	503
Total cold run time: 53990 ms
Total hot run time: 51281 ms

uchenily · 2026-01-29T15:39:37Z

run nonConcurrent

doris-robot · 2026-01-29T15:41:37Z

ClickBench: Total hot run time: 28.42 s

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 5b8202ed8a3e70ef7e82e16c02b8a08bb30b9bd6, data reload: false

query1	0.05	0.04	0.04
query2	0.09	0.04	0.04
query3	0.26	0.08	0.08
query4	1.60	0.11	0.12
query5	0.27	0.24	0.25
query6	1.16	0.67	0.68
query7	0.03	0.03	0.03
query8	0.05	0.04	0.04
query9	0.57	0.52	0.50
query10	0.54	0.54	0.54
query11	0.14	0.10	0.09
query12	0.14	0.11	0.10
query13	0.63	0.63	0.63
query14	1.08	1.07	1.05
query15	0.88	0.86	0.88
query16	0.39	0.40	0.40
query17	1.14	1.18	1.16
query18	0.23	0.21	0.21
query19	2.03	1.99	2.05
query20	0.02	0.01	0.01
query21	15.38	0.26	0.14
query22	4.74	0.06	0.04
query23	15.71	0.29	0.10
query24	1.23	0.74	0.29
query25	0.10	0.11	0.06
query26	0.15	0.13	0.14
query27	0.06	0.06	0.06
query28	3.47	1.16	0.97
query29	12.57	3.89	3.19
query30	0.27	0.14	0.11
query31	2.81	0.62	0.41
query32	3.25	0.59	0.49
query33	3.20	3.26	3.29
query34	16.38	5.41	4.71
query35	4.81	4.85	4.79
query36	0.66	0.49	0.49
query37	0.11	0.07	0.06
query38	0.07	0.04	0.04
query39	0.05	0.03	0.03
query40	0.19	0.16	0.17
query41	0.08	0.03	0.03
query42	0.05	0.03	0.03
query43	0.05	0.05	0.04
Total cold run time: 96.69 s
Total hot run time: 28.42 s

hello-stephen · 2026-01-29T15:45:42Z

BE Regression && UT Coverage Report

Increment line coverage 100.00% (1/1) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	71.44% (25587/35818)
Line Coverage	54.01% (267116/494525)
Region Coverage	51.54% (221996/430691)
Branch Coverage	53.00% (95428/180053)

hello-stephen · 2026-01-29T17:35:05Z

BE Regression && UT Coverage Report

Increment line coverage 100.00% (1/1) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	71.42% (25583/35818)
Line Coverage	54.01% (267069/494525)
Region Coverage	51.55% (222002/430691)
Branch Coverage	52.98% (95392/180053)

uchenily · 2026-01-30T01:13:03Z

run nonConcurrent

hello-stephen · 2026-01-30T03:13:49Z

BE Regression && UT Coverage Report

Increment line coverage 100.00% (1/1) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	71.43% (25586/35818)
Line Coverage	54.01% (267108/494525)
Region Coverage	51.56% (222044/430691)
Branch Coverage	53.00% (95420/180053)

uchenily · 2026-01-30T04:46:06Z

run buildall

doris-robot · 2026-01-30T05:54:00Z

TPC-H: Total hot run time: 31979 ms

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit ee50e210953bed719f4942bdc98e7c56e07b9c48, data reload: false

------ Round 1 ----------------------------------
q1	17640	5304	5094	5094
q2	2057	324	203	203
q3	10172	1350	769	769
q4	10209	874	323	323
q5	7550	2146	1948	1948
q6	199	184	150	150
q7	890	731	604	604
q8	9252	1374	1179	1179
q9	5357	4879	4828	4828
q10	6812	1955	1572	1572
q11	540	293	267	267
q12	350	380	220	220
q13	17805	4041	3207	3207
q14	233	240	226	226
q15	912	820	814	814
q16	659	663	623	623
q17	634	840	447	447
q18	6870	6509	6477	6477
q19	1134	985	612	612
q20	385	342	228	228
q21	2644	1910	1935	1910
q22	355	311	278	278
Total cold run time: 102659 ms
Total hot run time: 31979 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5342	5313	5321	5313
q2	253	368	250	250
q3	2156	2658	2275	2275
q4	1377	1750	1322	1322
q5	4232	4218	4239	4218
q6	216	178	137	137
q7	2177	2195	1867	1867
q8	2633	2403	2383	2383
q9	7513	7568	7543	7543
q10	2820	2987	2681	2681
q11	581	478	473	473
q12	884	732	589	589
q13	3866	4446	3642	3642
q14	306	351	289	289
q15	867	840	826	826
q16	667	749	677	677
q17	1127	1322	1313	1313
q18	8196	7722	8033	7722
q19	879	860	865	860
q20	2084	2189	2041	2041
q21	4873	4608	4207	4207
q22	579	575	491	491
Total cold run time: 53628 ms
Total hot run time: 51119 ms

doris-robot · 2026-01-30T06:10:46Z

ClickBench: Total hot run time: 28.78 s

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit ee50e210953bed719f4942bdc98e7c56e07b9c48, data reload: false

query1	0.06	0.05	0.04
query2	0.10	0.05	0.04
query3	0.26	0.08	0.08
query4	1.61	0.11	0.11
query5	0.28	0.23	0.24
query6	1.17	0.67	0.67
query7	0.03	0.02	0.02
query8	0.05	0.04	0.04
query9	0.57	0.51	0.50
query10	0.56	0.54	0.54
query11	0.14	0.09	0.09
query12	0.14	0.11	0.12
query13	0.64	0.61	0.62
query14	1.05	1.06	1.05
query15	0.88	0.88	0.87
query16	0.39	0.38	0.40
query17	1.16	1.13	1.13
query18	0.23	0.21	0.21
query19	2.07	2.01	2.01
query20	0.02	0.02	0.02
query21	15.39	0.28	0.15
query22	5.19	0.06	0.05
query23	16.00	0.28	0.10
query24	1.48	0.66	0.65
query25	0.09	0.12	0.07
query26	0.15	0.13	0.13
query27	0.05	0.06	0.05
query28	5.13	1.14	0.96
query29	12.55	3.96	3.18
query30	0.28	0.12	0.11
query31	2.81	0.66	0.39
query32	3.27	0.62	0.50
query33	3.23	3.35	3.32
query34	16.52	5.38	4.74
query35	4.79	4.76	4.75
query36	0.66	0.50	0.50
query37	0.13	0.07	0.07
query38	0.08	0.04	0.04
query39	0.04	0.03	0.03
query40	0.19	0.17	0.15
query41	0.09	0.03	0.04
query42	0.04	0.04	0.03
query43	0.05	0.03	0.03
Total cold run time: 99.62 s
Total hot run time: 28.78 s

uchenily · 2026-01-30T07:53:03Z

run cloud_p0

uchenily · 2026-01-30T07:53:09Z

run p0

uchenily · 2026-01-30T07:53:17Z

run vault_p0

doris-robot · 2026-01-30T07:59:30Z

BE UT Coverage Report

Increment line coverage 100.00% (1/1) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	52.49% (19272/36716)
Line Coverage	35.97% (179069/497780)
Region Coverage	32.37% (138808/428829)
Branch Coverage	33.33% (60085/180271)

uchenily · 2026-01-30T08:49:22Z

run p0

uchenily · 2026-01-30T09:52:39Z

run p0

uchenily · 2026-01-30T11:35:38Z

run p0

hello-stephen · 2026-01-30T13:19:55Z

BE Regression && UT Coverage Report

Increment line coverage 100.00% (1/1) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	57.20% (20588/35991)
Line Coverage	40.12% (199237/496603)
Region Coverage	36.82% (159530/433239)
Branch Coverage	37.52% (67921/181003)

uchenily · 2026-01-30T14:02:53Z

run p0

uchenily · 2026-01-30T15:00:36Z

run p0

hello-stephen · 2026-01-30T16:55:09Z

BE Regression && UT Coverage Report

Increment line coverage 100.00% (1/1) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	57.16% (20574/35991)
Line Coverage	40.10% (199144/496603)
Region Coverage	36.85% (159628/433239)
Branch Coverage	37.52% (67910/181003)

uchenily changed the title ~~[Fix](MergeIterator) Use actual block column count in VMergeIteratorC…~~ [Fix](MergeIterator) Use actual block column count in VMergeIteratorContext::copy_rows Jan 29, 2026

comment

ee50e21

Conversation

uchenily commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

Release note

Check List (For Author)

Check List (For Reviewer who merge this PR)

Uh oh!

Thearas commented Jan 29, 2026

Uh oh!

uchenily commented Jan 29, 2026

Uh oh!

doris-robot commented Jan 29, 2026

BE UT Coverage Report

Uh oh!

doris-robot commented Jan 29, 2026

Uh oh!

uchenily commented Jan 29, 2026

Uh oh!

doris-robot commented Jan 29, 2026

Uh oh!

hello-stephen commented Jan 29, 2026

BE Regression && UT Coverage Report

Uh oh!

hello-stephen commented Jan 29, 2026

BE Regression && UT Coverage Report

Uh oh!

uchenily commented Jan 30, 2026

Uh oh!

hello-stephen commented Jan 30, 2026

BE Regression && UT Coverage Report

Uh oh!

uchenily commented Jan 30, 2026

Uh oh!

doris-robot commented Jan 30, 2026

Uh oh!

doris-robot commented Jan 30, 2026

Uh oh!

uchenily commented Jan 30, 2026

Uh oh!

uchenily commented Jan 30, 2026

Uh oh!

uchenily commented Jan 30, 2026

Uh oh!

doris-robot commented Jan 30, 2026

BE UT Coverage Report

Uh oh!

uchenily commented Jan 30, 2026

Uh oh!

uchenily commented Jan 30, 2026

Uh oh!

uchenily commented Jan 30, 2026

Uh oh!

hello-stephen commented Jan 30, 2026

BE Regression && UT Coverage Report

Uh oh!

uchenily commented Jan 30, 2026

Uh oh!

uchenily commented Jan 30, 2026

Uh oh!

hello-stephen commented Jan 30, 2026

BE Regression && UT Coverage Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

uchenily commented Jan 29, 2026 •

edited

Loading