Skip to content

[fix](azure) Fix incorrect modification timestamps in AzureObjStorage#61790

Open
dataroaring wants to merge 1 commit intoapache:masterfrom
dataroaring:fix/azure_timestamp_epoch_milli
Open

[fix](azure) Fix incorrect modification timestamps in AzureObjStorage#61790
dataroaring wants to merge 1 commit intoapache:masterfrom
dataroaring:fix/azure_timestamp_epoch_milli

Conversation

@dataroaring
Copy link
Contributor

Summary

Fix incorrect modification timestamps in AzureObjStorage across three code paths:

  • globList LIST path: Used getLastModified().getSecond() which returns the second-of-minute (0–59), not an epoch timestamp
  • globListByGetProperties HEAD path: Used toEpochSecond() which returns epoch seconds — off by 1000× from S3's toEpochMilli()
  • listFiles path: Same getSecond() bug as the glob path

All three are fixed to use .toInstant().toEpochMilli(), consistent with S3ObjStorage which uses Instant.toEpochMilli().

Found via review of #61775 (cherry-pick of #60414).

Test plan

  • Verify Azure blob listing returns correct modification timestamps in epoch milliseconds
  • Confirm consistency between S3 and Azure storage backends for RemoteFile.modifiedTime

🤖 Generated with Claude Code

The modification time for Azure blobs was computed incorrectly in three
places:

- globList LIST path (line 428): used getLastModified().getSecond()
  which returns the second-of-minute (0-59), not an epoch timestamp.
- globListByGetProperties HEAD path (line 503): used toEpochSecond()
  which returns epoch seconds, off by 1000x from S3's toEpochMilli().
- listFiles path (line 564): same getSecond() bug as the glob path.

All three are now fixed to use .toInstant().toEpochMilli(), consistent
with S3ObjStorage which uses Instant.toEpochMilli().

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@dataroaring dataroaring requested a review from CalvinKirs as a code owner March 27, 2026 02:02
Copilot AI review requested due to automatic review settings March 27, 2026 02:02
@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes incorrect RemoteFile.modificationTime values produced by the Azure object storage backend by normalizing Azure “last modified” timestamps to epoch milliseconds, aligning behavior with S3 and other file systems.

Changes:

  • Fix globList LIST-path timestamp bug (previously used getSecond() which is second-of-minute).
  • Fix deterministic-path globListByGetProperties (HEAD/getProperties) timestamp unit from epoch seconds to epoch milliseconds.
  • Fix listFiles timestamp bug (same getSecond() issue).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +428 to +429
isPrefix ? 0 : blobItem.getProperties().getLastModified()
.toInstant().toEpochMilli());
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add/extend unit tests (e.g., AzureObjStorageTest’s mocked paged listing) to assert RemoteFile.modificationTime is an epoch-millisecond value derived from BlobItemProperties.getLastModified().toInstant().toEpochMilli(), so regressions back to second-of-minute/epoch-seconds are caught.

Copilot uses AI. Check for mistakes.
Comment on lines 503 to 505
props.getLastModified() != null
? props.getLastModified().toEpochSecond() : 0
? props.getLastModified().toInstant().toEpochMilli() : 0
);
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add test coverage for the deterministic-path getProperties/HEAD flow to validate the modification time unit is epoch milliseconds (not epoch seconds). Using a fixed OffsetDateTime in mocks will make the assertion deterministic.

Copilot uses AI. Check for mistakes.
Comment on lines 563 to 566
props.getContentLength(),
props.getContentLength(),
props.getLastModified().getSecond(),
props.getLastModified().toInstant().toEpochMilli(),
null);
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a unit test around listFiles() (or a small isolated helper) to assert BlobItemProperties.getLastModified is converted to epoch milliseconds, since this path previously returned an incorrect value and is easy to regress.

Copilot uses AI. Check for mistakes.
@dataroaring
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 28215 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit ebb7ca45c66b753583a6c74bc31f3e65a143f201, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17616	5268	5104	5104
q2	q3	10644	847	532	532
q4	4678	392	268	268
q5	7577	1250	1018	1018
q6	186	187	157	157
q7	890	924	702	702
q8	9670	1624	1535	1535
q9	5775	5051	5028	5028
q10	6330	1973	1678	1678
q11	482	268	241	241
q12	788	659	502	502
q13	18031	2830	2026	2026
q14	241	248	215	215
q15	q16	754	743	683	683
q17	741	881	432	432
q18	6428	5363	5275	5275
q19	1128	979	618	618
q20	569	505	382	382
q21	4869	2166	1532	1532
q22	397	346	287	287
Total cold run time: 97794 ms
Total hot run time: 28215 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	5813	5631	5688	5631
q2	q3	4081	4447	3930	3930
q4	1091	1460	875	875
q5	4274	4607	4568	4568
q6	222	198	153	153
q7	1916	1665	1541	1541
q8	2896	3098	2966	2966
q9	7912	7772	7605	7605
q10	3964	4186	3689	3689
q11	578	521	440	440
q12	546	644	470	470
q13	2647	3155	2175	2175
q14	296	379	283	283
q15	q16	763	872	722	722
q17	1434	1569	1529	1529
q18	7328	6913	6700	6700
q19	1006	944	949	944
q20	2123	2224	1979	1979
q21	4560	3970	3838	3838
q22	480	449	395	395
Total cold run time: 53930 ms
Total hot run time: 50433 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 168707 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit ebb7ca45c66b753583a6c74bc31f3e65a143f201, data reload: false

query5	4341	654	525	525
query6	342	234	203	203
query7	4215	469	274	274
query8	334	244	220	220
query9	8698	2680	2673	2673
query10	531	398	339	339
query11	6995	5110	4846	4846
query12	183	129	124	124
query13	1281	481	361	361
query14	5785	3679	3444	3444
query14_1	2909	2780	2729	2729
query15	202	189	179	179
query16	974	470	478	470
query17	885	732	602	602
query18	2428	431	334	334
query19	216	202	172	172
query20	134	124	125	124
query21	212	133	105	105
query22	13244	13863	14847	13863
query23	16882	16887	16104	16104
query23_1	16083	16018	15701	15701
query24	7145	1630	1226	1226
query24_1	1224	1226	1239	1226
query25	574	476	437	437
query26	1248	269	153	153
query27	2767	480	297	297
query28	4537	1832	1864	1832
query29	871	597	496	496
query30	300	225	187	187
query31	1017	943	867	867
query32	87	73	76	73
query33	538	349	303	303
query34	920	875	525	525
query35	664	702	597	597
query36	1095	1130	993	993
query37	144	98	88	88
query38	2924	2957	2935	2935
query39	872	842	814	814
query39_1	797	815	798	798
query40	240	156	140	140
query41	71	66	66	66
query42	264	269	256	256
query43	247	246	223	223
query44	
query45	203	186	185	185
query46	881	987	608	608
query47	2144	2199	2084	2084
query48	311	319	232	232
query49	671	509	387	387
query50	707	274	212	212
query51	4111	4071	3999	3999
query52	260	265	259	259
query53	289	338	280	280
query54	295	270	258	258
query55	89	90	82	82
query56	300	306	310	306
query57	1896	1779	1715	1715
query58	282	268	268	268
query59	2775	2976	2737	2737
query60	327	336	316	316
query61	156	151	157	151
query62	614	586	547	547
query63	307	268	276	268
query64	5095	1276	1020	1020
query65	
query66	1452	450	431	431
query67	24266	24316	24208	24208
query68	
query69	404	318	288	288
query70	980	926	961	926
query71	335	306	288	288
query72	2846	2628	2400	2400
query73	547	541	314	314
query74	9624	9604	9380	9380
query75	2880	2748	2458	2458
query76	2298	1032	704	704
query77	369	400	302	302
query78	10964	11133	10461	10461
query79	2709	774	581	581
query80	1714	636	551	551
query81	546	267	227	227
query82	989	154	117	117
query83	330	267	246	246
query84	294	121	101	101
query85	904	495	451	451
query86	429	302	328	302
query87	3167	3140	2997	2997
query88	3558	2664	2687	2664
query89	432	368	333	333
query90	2003	183	164	164
query91	164	162	136	136
query92	78	77	71	71
query93	1164	831	495	495
query94	643	328	298	298
query95	594	415	332	332
query96	634	532	230	230
query97	2458	2485	2372	2372
query98	244	226	219	219
query99	1029	991	941	941
Total cold run time: 252473 ms
Total hot run time: 168707 ms

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 50.00% (2/4) 🎉
Increment coverage report
Complete coverage report

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Mar 27, 2026
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants