Skip to content

[chore](be) Document lightweight JSONB validation#63359

Draft
mrhhsg wants to merge 1 commit into
apache:masterfrom
mrhhsg:codex/fix-jsonb-document-validation
Draft

[chore](be) Document lightweight JSONB validation#63359
mrhhsg wants to merge 1 commit into
apache:masterfrom
mrhhsg:codex/fix-jsonb-document-validation

Conversation

@mrhhsg
Copy link
Copy Markdown
Member

@mrhhsg mrhhsg commented May 18, 2026

What problem does this PR solve?

Issue Number: DORIS-25577

Problem Summary: After evaluating the JSONB INSERT/Stream Load path and JSONB query call sites, recursive full validation in JsonbDocument::checkAndCreateDocument is not appropriate for the default helper. INSERT and load inputs are JSON text that BE re-encodes through JsonBinaryValue/JsonbWriter, while checkAndCreateDocument and createValue are also used by JSONB scalar/table functions in per-row hot paths.

This PR documents why these helpers intentionally remain lightweight and why any future deep validation should be added only at a clearly untrusted raw JSONB binary boundary.

Release note

None

Check List (For Author)

  • Test:
    • Manual test: git diff --check
  • Behavior changed: No
  • Does this need documentation: No

@mrhhsg mrhhsg force-pushed the codex/fix-jsonb-document-validation branch from 14699fa to e28deaf Compare May 18, 2026 09:20
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

### What problem does this PR solve?

Issue Number: DORIS-25577

Related PR: None

Problem Summary: Avoid making JsonbDocument::checkAndCreateDocument perform recursive full validation by default. The helper is used in JSONB query hot paths, while INSERT and load paths build JSONB through JsonBinaryValue/JsonbWriter before storage.

### Release note

None

### Check List (For Author)

- Test:
    - Manual test: git diff --check
- Behavior changed: No
- Does this need documentation: No
@mrhhsg mrhhsg force-pushed the codex/fix-jsonb-document-validation branch from e28deaf to 2b51e87 Compare May 18, 2026 09:47
@mrhhsg mrhhsg changed the title [fix](be) Validate JSONB document body [chore](be) Document lightweight JSONB validation May 18, 2026
@mrhhsg
Copy link
Copy Markdown
Member Author

mrhhsg commented May 18, 2026

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. I found no blocking issues in the actual PR diff, which only documents that JsonbDocument::checkAndCreateDocument and createValue intentionally perform lightweight structural validation.

Critical checkpoint conclusions:

  • Goal and tests: The PR documents the intended validation boundary/hot-path tradeoff and does not change behavior. git diff --check passes for the touched files.
  • Scope: The change is small and focused on comments in the two JSONB helper locations.
  • Concurrency/lifecycle/config/incompatibility: Not applicable; no runtime behavior, shared state, config, storage format, or protocol changes.
  • Parallel paths: Both checkAndCreateDocument and createValue now carry matching guidance.
  • Error handling/data correctness/memory/performance: No code-path behavior changes; the comments preserve the existing lightweight hot-path validation contract.
  • Test coverage/results: No new functional tests are required for comment-only changes; no generated results were modified.
  • Observability/transaction/persistence/FE-BE passing: Not applicable.

User focus: No additional user-provided review focus was present.

@mrhhsg
Copy link
Copy Markdown
Member Author

mrhhsg commented May 19, 2026

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 31902 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 2b51e876ed1c98933621e3c9754f0f4b3bdf4d37, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17647	4068	4054	4054
q2	q3	10791	1381	807	807
q4	4680	475	344	344
q5	7782	2253	2146	2146
q6	400	176	149	149
q7	950	789	653	653
q8	9655	1643	1615	1615
q9	6998	4923	4931	4923
q10	6437	2102	1833	1833
q11	452	277	262	262
q12	647	432	306	306
q13	18216	3459	2822	2822
q14	267	257	240	240
q15	q16	824	783	710	710
q17	905	963	1037	963
q18	6944	5695	5588	5588
q19	1188	1449	1180	1180
q20	529	422	289	289
q21	5930	2790	2698	2698
q22	458	375	320	320
Total cold run time: 101700 ms
Total hot run time: 31902 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4723	4559	4959	4559
q2	q3	4986	5250	4673	4673
q4	2145	2230	1414	1414
q5	4734	4771	4640	4640
q6	235	194	131	131
q7	1821	1755	1540	1540
q8	2294	1954	1937	1937
q9	7247	7296	7169	7169
q10	4494	4452	4044	4044
q11	538	428	368	368
q12	715	720	524	524
q13	3057	3431	2806	2806
q14	268	278	256	256
q15	q16	694	713	619	619
q17	1260	1254	1242	1242
q18	7453	6859	6736	6736
q19	1083	1090	1086	1086
q20	2220	2219	1909	1909
q21	5355	4735	4620	4620
q22	521	456	416	416
Total cold run time: 55843 ms
Total hot run time: 50689 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 170404 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 2b51e876ed1c98933621e3c9754f0f4b3bdf4d37, data reload: false

query5	4332	674	520	520
query6	345	218	197	197
query7	4253	583	298	298
query8	324	229	217	217
query9	8815	4089	4072	4072
query10	460	343	297	297
query11	5726	2355	2171	2171
query12	184	133	126	126
query13	1301	571	408	408
query14	6084	5412	5116	5116
query14_1	4455	4447	4415	4415
query15	216	204	196	196
query16	1012	487	463	463
query17	1150	745	618	618
query18	2501	499	377	377
query19	233	213	179	179
query20	142	134	131	131
query21	216	151	127	127
query22	13624	13565	13514	13514
query23	17328	16446	16058	16058
query23_1	16152	16183	16184	16183
query24	7405	1739	1321	1321
query24_1	1342	1315	1323	1315
query25	592	498	464	464
query26	1314	322	178	178
query27	2692	606	345	345
query28	4444	1982	1966	1966
query29	1030	647	501	501
query30	294	242	205	205
query31	1126	1061	944	944
query32	96	74	76	74
query33	521	356	296	296
query34	1145	1132	631	631
query35	768	779	670	670
query36	1306	1312	1157	1157
query37	150	115	94	94
query38	3258	3178	3052	3052
query39	940	937	897	897
query39_1	866	865	874	865
query40	226	151	129	129
query41	70	63	61	61
query42	110	115	109	109
query43	334	336	294	294
query44	
query45	210	209	202	202
query46	1083	1245	707	707
query47	2283	2285	2216	2216
query48	407	422	309	309
query49	654	485	400	400
query50	972	349	252	252
query51	4370	4248	4229	4229
query52	111	106	96	96
query53	257	289	212	212
query54	343	282	267	267
query55	97	91	85	85
query56	303	323	334	323
query57	1389	1405	1291	1291
query58	303	292	271	271
query59	1579	1666	1450	1450
query60	345	330	300	300
query61	158	155	155	155
query62	676	634	553	553
query63	243	205	217	205
query64	2451	808	641	641
query65	
query66	1841	479	360	360
query67	30409	30248	29843	29843
query68	
query69	475	334	306	306
query70	1050	1024	999	999
query71	315	281	276	276
query72	3112	2726	2442	2442
query73	857	728	408	408
query74	5089	4951	4771	4771
query75	2666	2625	2251	2251
query76	2318	1153	803	803
query77	411	423	338	338
query78	12275	12153	11710	11710
query79	1390	1018	725	725
query80	674	543	472	472
query81	479	283	245	245
query82	850	158	121	121
query83	352	288	250	250
query84	264	137	116	116
query85	910	541	451	451
query86	408	324	327	324
query87	3452	3365	3187	3187
query88	3594	2688	2681	2681
query89	429	398	339	339
query90	1951	184	190	184
query91	179	170	138	138
query92	81	74	74	74
query93	1481	1447	878	878
query94	549	362	321	321
query95	683	481	355	355
query96	1111	757	328	328
query97	2683	2652	2576	2576
query98	240	235	231	231
query99	1121	1152	1007	1007
Total cold run time: 253797 ms
Total hot run time: 170404 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.62% (20753/38707)
Line Coverage 37.23% (196224/526996)
Region Coverage 33.58% (153859/458166)
Branch Coverage 34.61% (67004/193607)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100% (0/0) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.66% (27923/37907)
Line Coverage 57.62% (302869/525653)
Region Coverage 54.70% (253023/462576)
Branch Coverage 56.23% (109284/194342)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants