Skip to content

[fix](cloud) Fill schema change version holes before running#63443

Open
liaoxin01 wants to merge 1 commit into
apache:masterfrom
liaoxin01:fix/cloud-schema-change-fill-holes
Open

[fix](cloud) Fill schema change version holes before running#63443
liaoxin01 wants to merge 1 commit into
apache:masterfrom
liaoxin01:fix/cloud-schema-change-fill-holes

Conversation

@liaoxin01
Copy link
Copy Markdown
Contributor

@liaoxin01 liaoxin01 commented May 20, 2026

What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: During cloud schema change, empty rowsets created while calculating delete bitmap were only added to a temporary tablet. The real new tablet could become RUNNING with a local version graph hole after the alter version. Subsequent delete bitmap sync on the RUNNING MOW tablet captures old rowset ids before filling local holes, so it can fail to find a continuous version path.

This change fills version holes on the real new tablet after adding schema change output rowsets and before switching it to RUNNING. It preserves the existing schema change rule that skips holes at or below alter_version. A unit test covers the local graph repair and verifies capture_consistent_versions can traverse through the filled empty rowset.

Release note

None

Check List (For Author)

  • Test: Unit Test
  • Does this need documentation: No

Copilot AI review requested due to automatic review settings May 20, 2026 09:30
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@liaoxin01 liaoxin01 force-pushed the fix/cloud-schema-change-fill-holes branch 5 times, most recently from 5a3915d to 78af3fb Compare May 20, 2026 15:02
@liaoxin01
Copy link
Copy Markdown
Contributor Author

run buildall

### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: During cloud schema change, empty rowsets created while calculating delete bitmap were only added to a temporary tablet. The real new tablet could become RUNNING with a local version graph hole after the alter version. Subsequent delete bitmap sync on the RUNNING tablet captures old rowset ids before filling local holes, so it can fail to find a continuous version path. This change fills version holes on the real new tablet after adding schema change output rowsets and before switching it to RUNNING, preserving the existing schema change rule that skips holes at or below alter_version. Unit tests cover both the local graph repair helper and the schema change finalization path, verifying capture_consistent_versions can traverse through the filled empty rowset.

### Release note

None

### Check List (For Author)

- Test: Unit Test
    - ./run-be-ut.sh --run --filter=CloudSchemaChangeJobTest.FillVersionHolesBeforeNewTabletRunning:CloudTabletDeleteRowsetsForSchemaChangeTest.TestFillVersionHolesBeforeSchemaChangeRunning -j 8
- Behavior changed: Yes. Cloud schema change now repairs local version holes on the real new tablet before it becomes RUNNING.
- Does this need documentation: No
@liaoxin01 liaoxin01 force-pushed the fix/cloud-schema-change-fill-holes branch from 78af3fb to db71d76 Compare May 20, 2026 15:24
@liaoxin01
Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 31200 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit db71d76875589c45c8b8526b7e264e41379427e4, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17690	4141	3967	3967
q2	q3	10749	1352	795	795
q4	4686	475	351	351
q5	7604	2228	2159	2159
q6	251	177	139	139
q7	972	792	656	656
q8	9348	1643	1623	1623
q9	6687	4951	4939	4939
q10	6440	2136	1863	1863
q11	439	284	247	247
q12	695	427	306	306
q13	18210	3303	2811	2811
q14	263	255	227	227
q15	q16	817	770	708	708
q17	975	911	958	911
q18	7132	5738	5561	5561
q19	1168	1272	1013	1013
q20	519	412	262	262
q21	5653	2679	2365	2365
q22	436	360	297	297
Total cold run time: 100734 ms
Total hot run time: 31200 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4318	4287	4346	4287
q2	q3	4545	4908	4311	4311
q4	2120	2261	1396	1396
q5	4406	4303	4363	4303
q6	336	205	137	137
q7	2098	1827	1697	1697
q8	2540	2158	2174	2158
q9	7846	7869	7826	7826
q10	4553	4479	4170	4170
q11	627	437	393	393
q12	772	748	537	537
q13	3553	3738	3139	3139
q14	323	328	293	293
q15	q16	755	757	685	685
q17	1378	1393	1393	1393
q18	8144	7338	6915	6915
q19	1111	1066	1112	1066
q20	2205	2223	1935	1935
q21	5427	4827	4691	4691
q22	510	459	402	402
Total cold run time: 57567 ms
Total hot run time: 51734 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 170019 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit db71d76875589c45c8b8526b7e264e41379427e4, data reload: false

query5	4313	657	518	518
query6	359	223	202	202
query7	4299	588	309	309
query8	337	226	227	226
query9	8815	3995	4027	3995
query10	438	331	294	294
query11	5762	2382	2257	2257
query12	180	132	131	131
query13	1267	625	450	450
query14	6087	5420	5095	5095
query14_1	4394	4382	4394	4382
query15	213	208	184	184
query16	1021	451	426	426
query17	1131	714	572	572
query18	2494	479	349	349
query19	212	212	158	158
query20	139	136	130	130
query21	215	139	117	117
query22	13591	13578	13340	13340
query23	17159	16500	16024	16024
query23_1	16174	16351	16234	16234
query24	7727	1747	1283	1283
query24_1	1322	1282	1319	1282
query25	561	476	413	413
query26	1345	319	166	166
query27	2697	541	337	337
query28	4459	1959	1959	1959
query29	1022	630	488	488
query30	319	236	205	205
query31	1108	1042	937	937
query32	94	78	70	70
query33	534	345	293	293
query34	1177	1152	644	644
query35	756	796	692	692
query36	1361	1367	1177	1177
query37	157	105	90	90
query38	3185	3136	3071	3071
query39	937	934	896	896
query39_1	869	879	894	879
query40	236	154	131	131
query41	71	68	74	68
query42	114	116	120	116
query43	323	327	289	289
query44	
query45	220	207	199	199
query46	1090	1212	716	716
query47	2304	2356	2217	2217
query48	409	423	306	306
query49	664	513	406	406
query50	1070	349	267	267
query51	4346	4369	4260	4260
query52	109	113	97	97
query53	255	297	220	220
query54	330	287	268	268
query55	96	92	86	86
query56	307	333	333	333
query57	1457	1416	1339	1339
query58	313	285	279	279
query59	1530	1647	1363	1363
query60	347	334	325	325
query61	188	187	183	183
query62	674	609	565	565
query63	249	208	205	205
query64	2458	876	728	728
query65	
query66	1717	473	353	353
query67	30010	29877	29820	29820
query68	
query69	476	349	294	294
query70	966	977	959	959
query71	306	271	271	271
query72	3007	2661	2414	2414
query73	863	764	412	412
query74	5073	4882	4719	4719
query75	2696	2621	2243	2243
query76	2305	1150	766	766
query77	392	405	329	329
query78	12158	12246	11606	11606
query79	1513	1057	760	760
query80	1299	556	465	465
query81	526	284	242	242
query82	985	156	120	120
query83	321	273	250	250
query84	250	137	110	110
query85	920	548	456	456
query86	444	347	330	330
query87	3438	3336	3223	3223
query88	3540	2665	2630	2630
query89	453	382	335	335
query90	1913	187	190	187
query91	186	169	142	142
query92	82	78	69	69
query93	1604	1439	867	867
query94	722	369	322	322
query95	681	394	464	394
query96	1058	809	309	309
query97	2697	2711	2569	2569
query98	239	242	233	233
query99	1104	1143	999	999
Total cold run time: 254457 ms
Total hot run time: 170019 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (2/2) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.66% (27947/37938)
Line Coverage 57.66% (303472/526305)
Region Coverage 54.87% (254200/463294)
Branch Coverage 56.38% (109816/194788)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants