…ontext::copy_rows
This PR ensures that VMergeIteratorContext::copy_rows iterates over all columns present in the input block by using block->columns() instead of a unsafe _num_columns value.
This fix prevents column count mismatches when the read schema is changed. The data copying logic remains synchronized with the actual structure of the block at runtime,
regardless of whether the schema has been expanded for delete predicates.
Consider the following table:
```sql
CREATE TABLE tbl (
k INT NOT NULL,
v1 INT NOT NULL,
v2 INT NOT NULL
) DUPLICATE KEY(k) ...;
```
And a delete predicate applied to a non-key column:
```sql
DELETE FROM tbl WHERE v1 = 1;
```
When executing ORDER BY k LIMIT n, Doris has a Top-N optimization. Even though the query is SELECT *, the engine initially avoids scanning all columns.
It constructs a minimal intermediate schema containing only the sort keys (k) and the internal `__DORIS_ROWID_COL__` to perform the merge and sorting efficiently. (_col_ids = {0, 3}, ==> _num_columns = 2).
However, because a delete predicate exists on column v1, the BetaRowsetReader add v1 to this intermediate schema to evaluate and filter out deleted rows during the scan.
(_col_ids = {0, 3, 1}, note that column v1 (index=1) is appended to this schema ==> _num_columns = 3)
The previous implementation of VMergeIteratorContext::copy_rows used the incorrect _num_columns value, resulting in an array out-of-bounds access and causing BE coredumped.
Detailed reproduction steps are follows:
1. modify conf/be.conf
```
write_buffer_size = 8
```
2. run sql
```sql
CREATE TABLE tbl1
(
k INT NOT NULL,
v1 INT NOT NULL,
v2 INT NOT NULL
)
DUPLICATE KEY(k)
DISTRIBUTED BY HASH(k) BUCKETS 5
PROPERTIES(
"replication_num" = "1"
);
CREATE TABLE tbl2
(
k INT NOT NULL,
v1 INT NOT NULL,
v2 INT NOT NULL
)
DUPLICATE KEY(k)
DISTRIBUTED BY HASH(k) BUCKETS 1
PROPERTIES(
"replication_num" = "1"
);
INSERT INTO tbl1 VALUES (1, 1, 1),(2, 2, 2),(3, 3, 3),(4, 4, 4),(5, 5, 5);
INSERT INTO tbl2 SELECT * FROM tbl1;
SELECT * FROM tbl2 ORDER BY k limit 100; -- ok
DELETE FROM tbl2 WHERE v1 = 100;
SELECT * FROM tbl2 ORDER BY k limit 100; -- coredump
```
This PR ensures that VMergeIteratorContext::copy_rows iterates over all columns present in the input block by using block->columns() instead of a unsafe _num_columns value. This fix prevents column count mismatches when the read schema is changed. The data copying logic remains synchronized with the actual structure of the block at runtime, regardless of whether the schema has been expanded for delete predicates.
Consider the following table:
And a delete predicate applied to a non-key column:
When executing ORDER BY k LIMIT n, Doris has a Top-N optimization. Even though the query is SELECT *, the engine initially avoids scanning all columns. It constructs a minimal intermediate schema containing only the sort keys (k) and the internal
__DORIS_ROWID_COL__to perform the merge and sorting efficiently. (_col_ids = {0, 3}, ==> _num_columns = 2). However, because a delete predicate exists on column v1, the BetaRowsetReader add v1 to this intermediate schema to evaluate and filter out deleted rows during the scan. (_col_ids = {0, 3, 1}, note that column v1 (index=1) is appended to this schema ==> _num_columns = 3)The previous implementation of VMergeIteratorContext::copy_rows used the incorrect _num_columns value, resulting in an array out-of-bounds access and causing BE coredumped.
Detailed reproduction steps are follows:
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)