Skip to content

[fix](filecache) clean empty v3 cache dirs#63344

Open
liaoxin01 wants to merge 1 commit into
apache:masterfrom
liaoxin01:fix-file-cache-empty-dirs-63117
Open

[fix](filecache) clean empty v3 cache dirs#63344
liaoxin01 wants to merge 1 commit into
apache:masterfrom
liaoxin01:fix-file-cache-empty-dirs-63117

Conversation

@liaoxin01
Copy link
Copy Markdown
Contributor

@liaoxin01 liaoxin01 commented May 18, 2026

What problem does this PR solve?

Issue Number: #63117

Fixes file cache empty directory leakage for v3 cache layout.

In v3 layout, cache files live under <hash_prefix>/<hash>_0/. FSFileCacheStorage::remove() deletes the v3 cache file first, then switches the local directory variable to the legacy v2 path for compatibility cleanup. The final empty-directory cleanup therefore checks the v2 directory and can leave the actual v3 <hash>_0 directory behind.

This patch keeps v2 and v3 directory paths separately, preserves the existing v2 compatibility cleanup, and additionally removes the v3 key directory when it is empty. The v3 directory cleanup uses non-recursive std::filesystem::remove() to avoid deleting concurrently created files.

Check List

  • Added unit test
  • Regression test

Copilot AI review requested due to automatic review settings May 18, 2026 07:03
@liaoxin01 liaoxin01 force-pushed the fix-file-cache-empty-dirs-63117 branch from af219e4 to 4e6b403 Compare May 18, 2026 07:08
@liaoxin01
Copy link
Copy Markdown
Contributor Author

/review

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes v3 file cache cleanup so removing a cache block also attempts to remove the now-empty v3 key directory, addressing inode leakage from stale empty cache directories.

Changes:

  • Tracks v2 and v3 cache paths separately in FSFileCacheStorage::remove().
  • Adds v3 key-directory cleanup after file removal.
  • Adds a unit test covering removal of an empty v3 key directory.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
be/src/io/cache/fs_file_cache_storage.cpp Updates removal logic to clean v2 and v3 cache directories separately.
be/test/io/cache/fs_file_cache_storage_leak_cleaner_test.cpp Adds coverage for deleting an empty v3 key directory after cache removal.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread be/src/io/cache/fs_file_cache_storage.cpp Outdated
Comment thread be/src/io/cache/fs_file_cache_storage.cpp Outdated
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the PR diff and the existing inline review context. I am not adding duplicate inline comments because the blocking issues are already covered by the existing threads on be/src/io/cache/fs_file_cache_storage.cpp:310.

Critical checkpoint conclusions:

  • Goal/test: The PR aims to clean empty v3 cache directories and adds a unit test for deleting <hash>_0, but it does not fully accomplish the stated behavior because the empty <hash_prefix> parent cleanup described in the PR body is not implemented or tested.
  • Scope/focus: The code change is small and focused, but the cleanup primitive is too broad for the concurrent cache directory lifecycle.
  • Concurrency: FSFileCacheStorage::remove() can run concurrently with cache writes. The new v3 empty-dir cleanup lists the directory and then calls FileSystem::delete_directory(), which is recursive for local FS, so a file created after the emptiness check can be deleted. This is the same blocking concern already raised in the existing inline thread; it should be changed to non-recursive rmdir-style cleanup that tolerates ENOTEMPTY/NOT_FOUND.
  • Lifecycle/static initialization: No new special lifecycle or static initialization issue found.
  • Config/compatibility/protocol: No new config or FE/BE protocol compatibility issue found.
  • Parallel paths: v2 compatibility cleanup remains; v3 cleanup is added, but the v3 prefix parent path is a parallel part of the v3 layout and remains unhandled despite the PR description.
  • Tests: The added unit test covers key-directory deletion only. It does not cover prefix-directory cleanup or the concurrent-create race introduced by recursive deletion.
  • Observability/transactions/data visibility: Not applicable beyond cache file cleanup; no transaction/version visibility path is modified.
  • Performance: No separate performance issue found; the concern is correctness/safety of recursive deletion.

User focus: no additional user-provided review focus was supplied.

@liaoxin01 liaoxin01 force-pushed the fix-file-cache-empty-dirs-63117 branch 3 times, most recently from 67bd338 to f1437cd Compare May 18, 2026 08:37
@liaoxin01
Copy link
Copy Markdown
Contributor Author

run buildall

@liaoxin01 liaoxin01 force-pushed the fix-file-cache-empty-dirs-63117 branch from f1437cd to 102a46f Compare May 18, 2026 08:53
@liaoxin01 liaoxin01 force-pushed the fix-file-cache-empty-dirs-63117 branch from 102a46f to 2a1cb8c Compare May 18, 2026 09:06
@liaoxin01
Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 100.00% (31/31) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.50% (20652/38600)
Line Coverage 37.16% (195132/525117)
Region Coverage 33.51% (152625/455465)
Branch Coverage 34.57% (66589/192629)

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 31288 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 2a1cb8c07377ca61839216eb47c62f3e3e975d0e, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17660	3907	3866	3866
q2	q3	10793	1434	811	811
q4	4686	471	341	341
q5	7552	2295	2157	2157
q6	250	179	140	140
q7	922	772	639	639
q8	9452	1681	1511	1511
q9	5238	4933	4923	4923
q10	6424	2076	1832	1832
q11	443	270	247	247
q12	636	421	291	291
q13	18115	3517	2804	2804
q14	255	263	229	229
q15	q16	821	780	718	718
q17	943	991	956	956
q18	6786	5753	5462	5462
q19	1170	1216	1051	1051
q20	524	402	301	301
q21	6074	2928	2685	2685
q22	467	374	324	324
Total cold run time: 99211 ms
Total hot run time: 31288 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4788	4540	4589	4540
q2	q3	4987	5262	4633	4633
q4	2149	2220	1403	1403
q5	4739	4839	4742	4742
q6	276	181	131	131
q7	1831	1772	1528	1528
q8	2394	2086	2037	2037
q9	7772	7490	7210	7210
q10	4487	4422	4021	4021
q11	545	379	355	355
q12	714	726	499	499
q13	3109	3390	2806	2806
q14	276	272	250	250
q15	q16	687	693	609	609
q17	1263	1247	1222	1222
q18	7266	6801	6740	6740
q19	1099	1116	1073	1073
q20	2200	2179	1943	1943
q21	5358	4654	4960	4654
q22	520	466	449	449
Total cold run time: 56460 ms
Total hot run time: 50845 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 169209 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 2a1cb8c07377ca61839216eb47c62f3e3e975d0e, data reload: false

query5	4313	665	521	521
query6	342	226	200	200
query7	4256	582	301	301
query8	323	231	213	213
query9	8817	3977	3981	3977
query10	445	339	285	285
query11	5791	2375	2203	2203
query12	193	134	129	129
query13	1285	635	442	442
query14	5988	5341	5070	5070
query14_1	4362	4332	4364	4332
query15	216	205	186	186
query16	1001	452	433	433
query17	1095	734	617	617
query18	2501	514	376	376
query19	223	209	171	171
query20	141	134	131	131
query21	220	138	118	118
query22	13623	13440	13396	13396
query23	17343	16368	16012	16012
query23_1	16258	16225	16119	16119
query24	7472	1797	1294	1294
query24_1	1307	1301	1295	1295
query25	580	514	451	451
query26	1325	335	176	176
query27	2671	577	342	342
query28	4431	2005	1974	1974
query29	1007	646	534	534
query30	315	257	207	207
query31	1130	1070	944	944
query32	100	80	76	76
query33	555	365	307	307
query34	1221	1167	634	634
query35	761	773	683	683
query36	1320	1395	1182	1182
query37	149	104	88	88
query38	3214	3150	3061	3061
query39	932	923	916	916
query39_1	870	886	867	867
query40	232	145	124	124
query41	66	63	62	62
query42	110	111	110	110
query43	325	328	280	280
query44	
query45	212	199	196	196
query46	1080	1220	692	692
query47	2322	2351	2186	2186
query48	393	432	295	295
query49	637	496	388	388
query50	982	351	252	252
query51	4273	4311	4278	4278
query52	115	107	94	94
query53	261	280	205	205
query54	306	266	260	260
query55	97	92	90	90
query56	308	305	291	291
query57	1412	1419	1294	1294
query58	330	277	268	268
query59	1554	1608	1439	1439
query60	321	335	328	328
query61	159	157	156	156
query62	678	615	535	535
query63	241	198	202	198
query64	2478	836	627	627
query65	
query66	1697	474	357	357
query67	30103	29983	29805	29805
query68	
query69	464	344	301	301
query70	1023	1018	950	950
query71	310	276	264	264
query72	3032	2661	2429	2429
query73	859	791	385	385
query74	5069	4869	4726	4726
query75	2687	2589	2272	2272
query76	2277	1147	764	764
query77	417	411	341	341
query78	12230	12145	11732	11732
query79	1403	991	743	743
query80	668	547	457	457
query81	455	276	246	246
query82	1392	161	120	120
query83	337	284	245	245
query84	291	138	113	113
query85	892	533	479	479
query86	391	329	333	329
query87	3424	3362	3221	3221
query88	3525	2666	2636	2636
query89	460	388	335	335
query90	1886	185	176	176
query91	179	163	144	144
query92	82	79	74	74
query93	1427	1398	877	877
query94	538	338	322	322
query95	700	466	359	359
query96	1034	799	329	329
query97	2725	2705	2571	2571
query98	235	227	224	224
query99	1116	1079	981	981
Total cold run time: 253274 ms
Total hot run time: 169209 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (31/31) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.56% (27806/37800)
Line Coverage 57.53% (301298/523734)
Region Coverage 54.71% (251623/459880)
Branch Coverage 56.30% (108867/193359)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants