fix(llmobs): openai-java payload mapping for responses, tool metadata, and prompt tracking by ygree · Pull Request #10644 · DataDog/dd-trace-java

ygree · 2026-02-19T21:45:14Z

What Does This Do

Aligns OpenAI Java LLMObs span payloads with expected intake/system-test schema by:

Adding/filling missing LLMObs tags:
- _ml_obs_tag.integration
- _ml_obs_tag.source
- _ml_obs_tag.ddtrace.version
- _ml_obs_tag.error
- _ml_obs_tag.error_type
Ensuring model_name (and stable placeholder output where applicable) is set on error paths for
chat/completions/embeddings/responses.
Expanding Responses instrumentation:
- prompt tracking (input.prompt, variables, chat_template)
- tool definition extraction (tool_definitions)
- tool call/result extraction across function/custom/MCP outputs
- metadata normalization (stream, tool_choice, text.verbosity, etc.)
Refactoring JSON conversion via shared JsonValueUtils.
Updating LLMObs mapper payload shape:
- writes _dd map with span/trace ids
- nests error fields under meta.error
- supports map-based LLM input serialization (messages + prompt)
- remaps tool_definitions into meta.
Updating tests to add value-level assertions for the above behavior.

Motivation

OpenAI/LLMObs system tests exposed schema and tag mismatches in Java payloads (especially response spans, tool metadata, error mapping, and prompt tracking structure). This change brings Java output in line with expected LLMObs intake contract and behavior.

Additional Notes

openai-java-3.0 min version updated from 3.0.0 to 3.0.1.

DataDog/dd-apm-test-agent#280
DataDog/system-tests#6364

Contributor Checklist

Format the title according to the contribution guidelines
Assign the type: and (comp: or inst:) labels in addition to any other useful labels
Avoid using close, fix, or any linking keywords when referencing an issue
Use solves instead, and assign the PR milestone to the issue
Update the CODEOWNERS file on source file addition, migration, or deletion
Update public documentation with any new configuration flags or behaviors

Jira ticket: [PROJ-IDENT]

Note: Once your PR is ready to merge, add it to the merge queue by commenting /merge. /merge -c cancels the queue request. /merge -f --reason "reason" skips all merge queue checks; please use this judiciously, as some checks do not run at the PR-level. For more information, see this doc.

pr-commenter · 2026-02-19T22:33:17Z

Benchmarks

Startup

Parameters

	Baseline	Candidate
baseline_or_candidate	baseline	candidate
git_branch	master	ygree/llmobs-systest-fixes
git_commit_date	1772749357	1773798342
git_commit_sha	`4fd66d4`	`028d64f`
release_version	1.61.0-SNAPSHOT~4fd66d45a9	1.60.0-SNAPSHOT~028d64f1f8

See matching parameters

	Baseline	Candidate
application	insecure-bank	insecure-bank
ci_job_date	1773800247	1773800247
ci_job_id	1515829475	1515829475
ci_pipeline_id	103193634	103193634
cpu_model	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version	Linux runner-zfyrx7zua-project-304-concurrent-0-33ghjyic 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux	Linux runner-zfyrx7zua-project-304-concurrent-0-33ghjyic 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
module	Agent	Agent
parent	None	None

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 65 metrics, 6 unstable metrics.

Startup time reports for petclinic

gantt
    title petclinic - global startup overhead: candidate=1.60.0-SNAPSHOT~028d64f1f8, baseline=1.61.0-SNAPSHOT~4fd66d45a9

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.07 s) : 0, 1070094
Total [baseline] (11.13 s) : 0, 11129669
Agent [candidate] (1.059 s) : 0, 1059294
Total [candidate] (11.09 s) : 0, 11089855
section appsec
Agent [baseline] (1.25 s) : 0, 1250383
Total [baseline] (11.128 s) : 0, 11127977
Agent [candidate] (1.247 s) : 0, 1247120
Total [candidate] (11.082 s) : 0, 11082107
section iast
Agent [baseline] (1.23 s) : 0, 1229848
Total [baseline] (11.345 s) : 0, 11344740
Agent [candidate] (1.227 s) : 0, 1226710
Total [candidate] (11.47 s) : 0, 11470333
section profiling
Agent [baseline] (1.191 s) : 0, 1190616
Total [baseline] (11.035 s) : 0, 11034797
Agent [candidate] (1.213 s) : 0, 1212884
Total [candidate] (11.201 s) : 0, 11200529

baseline results

Module	Variant	Duration	Δ tracing
Agent	tracing	1.07 s	-
Agent	appsec	1.25 s	180.29 ms (16.8%)
Agent	iast	1.23 s	159.754 ms (14.9%)
Agent	profiling	1.191 s	120.523 ms (11.3%)
Total	tracing	11.13 s	-
Total	appsec	11.128 s	-1.692 ms (-0.0%)
Total	iast	11.345 s	215.071 ms (1.9%)
Total	profiling	11.035 s	-94.872 ms (-0.9%)

candidate results

Module	Variant	Duration	Δ tracing
Agent	tracing	1.059 s	-
Agent	appsec	1.247 s	187.826 ms (17.7%)
Agent	iast	1.227 s	167.416 ms (15.8%)
Agent	profiling	1.213 s	153.59 ms (14.5%)
Total	tracing	11.09 s	-
Total	appsec	11.082 s	-7.748 ms (-0.1%)
Total	iast	11.47 s	380.478 ms (3.4%)
Total	profiling	11.201 s	110.675 ms (1.0%)

gantt
    title petclinic - break down per module: candidate=1.60.0-SNAPSHOT~028d64f1f8, baseline=1.61.0-SNAPSHOT~4fd66d45a9

    dateFormat X
    axisFormat %s
section tracing
crashtracking [baseline] (1.234 ms) : 0, 1234
crashtracking [candidate] (1.197 ms) : 0, 1197
BytebuddyAgent [baseline] (633.528 ms) : 0, 633528
BytebuddyAgent [candidate] (628.248 ms) : 0, 628248
AgentMeter [baseline] (29.312 ms) : 0, 29312
AgentMeter [candidate] (29.105 ms) : 0, 29105
GlobalTracer [baseline] (258.554 ms) : 0, 258554
GlobalTracer [candidate] (256.843 ms) : 0, 256843
AppSec [baseline] (31.764 ms) : 0, 31764
AppSec [candidate] (31.438 ms) : 0, 31438
Debugger [baseline] (59.857 ms) : 0, 59857
Debugger [candidate] (59.265 ms) : 0, 59265
Remote Config [baseline] (603.679 µs) : 0, 604
Remote Config [candidate] (582.774 µs) : 0, 583
Telemetry [baseline] (8.69 ms) : 0, 8690
Telemetry [candidate] (8.612 ms) : 0, 8612
Flare Poller [baseline] (10.35 ms) : 0, 10350
Flare Poller [candidate] (7.965 ms) : 0, 7965
section appsec
crashtracking [baseline] (1.193 ms) : 0, 1193
crashtracking [candidate] (1.195 ms) : 0, 1195
BytebuddyAgent [baseline] (658.629 ms) : 0, 658629
BytebuddyAgent [candidate] (658.654 ms) : 0, 658654
AgentMeter [baseline] (12.057 ms) : 0, 12057
AgentMeter [candidate] (11.984 ms) : 0, 11984
GlobalTracer [baseline] (260.062 ms) : 0, 260062
GlobalTracer [candidate] (258.463 ms) : 0, 258463
IAST [baseline] (24.205 ms) : 0, 24205
IAST [candidate] (23.926 ms) : 0, 23926
AppSec [baseline] (178.486 ms) : 0, 178486
AppSec [candidate] (178.042 ms) : 0, 178042
Debugger [baseline] (66.119 ms) : 0, 66119
Debugger [candidate] (65.568 ms) : 0, 65568
Remote Config [baseline] (578.02 µs) : 0, 578
Remote Config [candidate] (566.936 µs) : 0, 567
Telemetry [baseline] (9.072 ms) : 0, 9072
Telemetry [candidate] (8.838 ms) : 0, 8838
Flare Poller [baseline] (3.628 ms) : 0, 3628
Flare Poller [candidate] (3.587 ms) : 0, 3587
section iast
crashtracking [baseline] (1.211 ms) : 0, 1211
crashtracking [candidate] (1.194 ms) : 0, 1194
BytebuddyAgent [baseline] (797.988 ms) : 0, 797988
BytebuddyAgent [candidate] (795.727 ms) : 0, 795727
AgentMeter [baseline] (11.341 ms) : 0, 11341
AgentMeter [candidate] (11.36 ms) : 0, 11360
GlobalTracer [baseline] (247.835 ms) : 0, 247835
GlobalTracer [candidate] (247.262 ms) : 0, 247262
IAST [baseline] (25.152 ms) : 0, 25152
IAST [candidate] (25.18 ms) : 0, 25180
AppSec [baseline] (27.271 ms) : 0, 27271
AppSec [candidate] (26.421 ms) : 0, 26421
Debugger [baseline] (62.564 ms) : 0, 62564
Debugger [candidate] (63.291 ms) : 0, 63291
Remote Config [baseline] (532.802 µs) : 0, 533
Remote Config [candidate] (528.949 µs) : 0, 529
Telemetry [baseline] (14.988 ms) : 0, 14988
Telemetry [candidate] (14.817 ms) : 0, 14817
Flare Poller [baseline] (4.87 ms) : 0, 4870
Flare Poller [candidate] (4.886 ms) : 0, 4886
section profiling
crashtracking [baseline] (1.179 ms) : 0, 1179
crashtracking [candidate] (1.203 ms) : 0, 1203
BytebuddyAgent [baseline] (688.129 ms) : 0, 688129
BytebuddyAgent [candidate] (701.396 ms) : 0, 701396
AgentMeter [baseline] (8.639 ms) : 0, 8639
AgentMeter [candidate] (8.81 ms) : 0, 8810
GlobalTracer [baseline] (216.584 ms) : 0, 216584
GlobalTracer [candidate] (220.458 ms) : 0, 220458
AppSec [baseline] (32.186 ms) : 0, 32186
AppSec [candidate] (33.141 ms) : 0, 33141
Debugger [baseline] (64.153 ms) : 0, 64153
Debugger [candidate] (64.358 ms) : 0, 64358
Remote Config [baseline] (588.212 µs) : 0, 588
Remote Config [candidate] (597.44 µs) : 0, 597
Telemetry [baseline] (9.887 ms) : 0, 9887
Telemetry [candidate] (10.767 ms) : 0, 10767
Flare Poller [baseline] (3.56 ms) : 0, 3560
Flare Poller [candidate] (3.636 ms) : 0, 3636
ProfilingAgent [baseline] (94.369 ms) : 0, 94369
ProfilingAgent [candidate] (96.093 ms) : 0, 96093
Profiling [baseline] (94.938 ms) : 0, 94938
Profiling [candidate] (96.669 ms) : 0, 96669

Startup time reports for insecure-bank

gantt
    title insecure-bank - global startup overhead: candidate=1.60.0-SNAPSHOT~028d64f1f8, baseline=1.61.0-SNAPSHOT~4fd66d45a9

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.065 s) : 0, 1064559
Total [baseline] (8.919 s) : 0, 8919250
Agent [candidate] (1.074 s) : 0, 1074408
Total [candidate] (8.847 s) : 0, 8846986
section iast
Agent [baseline] (1.237 s) : 0, 1236675
Total [baseline] (9.572 s) : 0, 9571802
Agent [candidate] (1.23 s) : 0, 1229624
Total [candidate] (9.562 s) : 0, 9561541

baseline results

Module	Variant	Duration	Δ tracing
Agent	tracing	1.065 s	-
Agent	iast	1.237 s	172.116 ms (16.2%)
Total	tracing	8.919 s	-
Total	iast	9.572 s	652.553 ms (7.3%)

candidate results

Module	Variant	Duration	Δ tracing
Agent	tracing	1.074 s	-
Agent	iast	1.23 s	155.217 ms (14.4%)
Total	tracing	8.847 s	-
Total	iast	9.562 s	714.555 ms (8.1%)

gantt
    title insecure-bank - break down per module: candidate=1.60.0-SNAPSHOT~028d64f1f8, baseline=1.61.0-SNAPSHOT~4fd66d45a9

    dateFormat X
    axisFormat %s
section tracing
crashtracking [baseline] (1.236 ms) : 0, 1236
crashtracking [candidate] (1.219 ms) : 0, 1219
BytebuddyAgent [baseline] (633.366 ms) : 0, 633366
BytebuddyAgent [candidate] (638.9 ms) : 0, 638900
AgentMeter [baseline] (29.466 ms) : 0, 29466
AgentMeter [candidate] (29.523 ms) : 0, 29523
GlobalTracer [baseline] (257.544 ms) : 0, 257544
GlobalTracer [candidate] (260.767 ms) : 0, 260767
AppSec [baseline] (31.591 ms) : 0, 31591
AppSec [candidate] (32.146 ms) : 0, 32146
Debugger [baseline] (58.629 ms) : 0, 58629
Debugger [candidate] (59.498 ms) : 0, 59498
Remote Config [baseline] (591.132 µs) : 0, 591
Remote Config [candidate] (603.755 µs) : 0, 604
Telemetry [baseline] (8.59 ms) : 0, 8590
Telemetry [candidate] (8.844 ms) : 0, 8844
Flare Poller [baseline] (7.223 ms) : 0, 7223
Flare Poller [candidate] (6.511 ms) : 0, 6511
section iast
crashtracking [baseline] (1.209 ms) : 0, 1209
crashtracking [candidate] (1.211 ms) : 0, 1211
BytebuddyAgent [baseline] (804.025 ms) : 0, 804025
BytebuddyAgent [candidate] (797.843 ms) : 0, 797843
AgentMeter [baseline] (11.572 ms) : 0, 11572
AgentMeter [candidate] (11.351 ms) : 0, 11351
GlobalTracer [baseline] (248.731 ms) : 0, 248731
GlobalTracer [candidate] (248.545 ms) : 0, 248545
IAST [baseline] (25.375 ms) : 0, 25375
IAST [candidate] (25.182 ms) : 0, 25182
AppSec [baseline] (26.597 ms) : 0, 26597
AppSec [candidate] (26.4 ms) : 0, 26400
Debugger [baseline] (62.67 ms) : 0, 62670
Debugger [candidate] (62.536 ms) : 0, 62536
Remote Config [baseline] (531.597 µs) : 0, 532
Remote Config [candidate] (524.816 µs) : 0, 525
Telemetry [baseline] (14.824 ms) : 0, 14824
Telemetry [candidate] (14.886 ms) : 0, 14886
Flare Poller [baseline] (4.853 ms) : 0, 4853
Flare Poller [candidate] (5.028 ms) : 0, 5028

Load

Parameters

	Baseline	Candidate
baseline_or_candidate	baseline	candidate
git_branch	master	ygree/llmobs-systest-fixes
git_commit_date	1772749357	1773798342
git_commit_sha	`4fd66d4`	`028d64f`
release_version	1.61.0-SNAPSHOT~4fd66d45a9	1.60.0-SNAPSHOT~028d64f1f8

See matching parameters

	Baseline	Candidate
application	insecure-bank	insecure-bank
ci_job_date	1773800721	1773800721
ci_job_id	1515829476	1515829476
ci_pipeline_id	103193634	103193634
cpu_model	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version	Linux runner-zfyrx7zua-project-304-concurrent-1-8lw21etw 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux	Linux runner-zfyrx7zua-project-304-concurrent-1-8lw21etw 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

Summary

Found 5 performance improvements and 3 performance regressions! Performance is the same for 13 metrics, 15 unstable metrics.

scenario	Δ mean agg_http_req_duration_p50	Δ mean agg_http_req_duration_p95	Δ mean throughput	candidate mean agg_http_req_duration_p50	candidate mean agg_http_req_duration_p95	candidate mean throughput	baseline mean agg_http_req_duration_p50	baseline mean agg_http_req_duration_p95	baseline mean throughput
scenario:load:insecure-bank:iast_FULL:high_load	better [-438.455µs; -219.273µs] or [-8.156%; -4.079%]	unsure [-846.832µs; -216.303µs] or [-6.755%; -1.725%]	unstable [-49.711op/s; +124.149op/s] or [-6.414%; +16.019%]	5.047ms	12.004ms	812.250op/s	5.376ms	12.536ms	775.031op/s
scenario:load:petclinic:iast:high_load	worse [+355.999µs; +1264.128µs] or [+2.020%; +7.173%]	unsure [+28.425µs; +1554.803µs] or [+0.099%; +5.391%]	unstable [-36.853op/s; +18.103op/s] or [-14.226%; +6.988%]	18.434ms	29.634ms	249.688op/s	17.624ms	28.842ms	259.062op/s
scenario:load:petclinic:profiling:high_load	better [-2.130ms; -0.835ms] or [-10.587%; -4.149%]	better [-2.625ms; -1.047ms] or [-8.222%; -3.279%]	unstable [-11.413op/s; +40.601op/s] or [-4.952%; +17.614%]	18.639ms	30.087ms	245.094op/s	20.121ms	31.923ms	230.500op/s
scenario:load:petclinic:code_origins:high_load	better [-1.857ms; -0.877ms] or [-9.967%; -4.709%]	better [-2.037ms; -0.613ms] or [-6.748%; -2.032%]	unstable [-11.252op/s; +44.252op/s] or [-4.556%; +17.918%]	17.267ms	28.857ms	263.469op/s	18.634ms	30.182ms	246.969op/s
scenario:load:petclinic:appsec:high_load	worse [+0.983ms; +1.721ms] or [+5.423%; +9.491%]	worse [+1.230ms; +2.721ms] or [+4.164%; +9.217%]	unstable [-41.340op/s; +10.402op/s] or [-16.437%; +4.136%]	19.484ms	31.500ms	236.031op/s	18.132ms	29.525ms	251.500op/s

Request duration reports for insecure-bank

gantt
    title insecure-bank - request duration [CI 0.99] : candidate=1.60.0-SNAPSHOT~028d64f1f8, baseline=1.61.0-SNAPSHOT~4fd66d45a9
    dateFormat X
    axisFormat %s
section baseline
no_agent (1.165 ms) : 1154, 1176
.   : milestone, 1165,
iast (3.174 ms) : 3131, 3217
.   : milestone, 3174,
iast_FULL (5.968 ms) : 5908, 6028
.   : milestone, 5968,
iast_GLOBAL (3.665 ms) : 3608, 3722
.   : milestone, 3665,
profiling (2.084 ms) : 2064, 2105
.   : milestone, 2084,
tracing (1.821 ms) : 1804, 1837
.   : milestone, 1821,
section candidate
no_agent (1.207 ms) : 1194, 1219
.   : milestone, 1207,
iast (3.191 ms) : 3146, 3236
.   : milestone, 3191,
iast_FULL (5.691 ms) : 5635, 5747
.   : milestone, 5691,
iast_GLOBAL (3.717 ms) : 3645, 3790
.   : milestone, 3717,
profiling (2.04 ms) : 2020, 2060
.   : milestone, 2040,
tracing (1.762 ms) : 1748, 1777
.   : milestone, 1762,

baseline results

Variant	Request duration [CI 0.99]	Δ no_agent
no_agent	1.165 ms [1.154 ms, 1.176 ms]	-
iast	3.174 ms [3.131 ms, 3.217 ms]	2.009 ms (172.4%)
iast_FULL	5.968 ms [5.908 ms, 6.028 ms]	4.803 ms (412.2%)
iast_GLOBAL	3.665 ms [3.608 ms, 3.722 ms]	2.5 ms (214.5%)
profiling	2.084 ms [2.064 ms, 2.105 ms]	919.16 µs (78.9%)
tracing	1.821 ms [1.804 ms, 1.837 ms]	655.334 µs (56.2%)

candidate results

Variant	Request duration [CI 0.99]	Δ no_agent
no_agent	1.207 ms [1.194 ms, 1.219 ms]	-
iast	3.191 ms [3.146 ms, 3.236 ms]	1.985 ms (164.5%)
iast_FULL	5.691 ms [5.635 ms, 5.747 ms]	4.485 ms (371.7%)
iast_GLOBAL	3.717 ms [3.645 ms, 3.79 ms]	2.511 ms (208.1%)
profiling	2.04 ms [2.02 ms, 2.06 ms]	833.709 µs (69.1%)
tracing	1.762 ms [1.748 ms, 1.777 ms]	555.847 µs (46.1%)

Request duration reports for petclinic

gantt
    title petclinic - request duration [CI 0.99] : candidate=1.60.0-SNAPSHOT~028d64f1f8, baseline=1.61.0-SNAPSHOT~4fd66d45a9
    dateFormat X
    axisFormat %s
section baseline
no_agent (18.479 ms) : 18294, 18664
.   : milestone, 18479,
appsec (18.554 ms) : 18368, 18741
.   : milestone, 18554,
code_origins (18.899 ms) : 18706, 19092
.   : milestone, 18899,
iast (18.011 ms) : 17833, 18190
.   : milestone, 18011,
profiling (20.26 ms) : 20049, 20470
.   : milestone, 20260,
tracing (17.672 ms) : 17494, 17850
.   : milestone, 17672,
section candidate
no_agent (18.132 ms) : 17949, 18315
.   : milestone, 18132,
appsec (19.782 ms) : 19578, 19987
.   : milestone, 19782,
code_origins (17.706 ms) : 17526, 17885
.   : milestone, 17706,
iast (18.686 ms) : 18499, 18873
.   : milestone, 18686,
profiling (19.044 ms) : 18855, 19233
.   : milestone, 19044,
tracing (17.722 ms) : 17547, 17897
.   : milestone, 17722,

baseline results

Variant	Request duration [CI 0.99]	Δ no_agent
no_agent	18.479 ms [18.294 ms, 18.664 ms]	-
appsec	18.554 ms [18.368 ms, 18.741 ms]	75.454 µs (0.4%)
code_origins	18.899 ms [18.706 ms, 19.092 ms]	419.626 µs (2.3%)
iast	18.011 ms [17.833 ms, 18.19 ms]	-467.714 µs (-2.5%)
profiling	20.26 ms [20.049 ms, 20.47 ms]	1.781 ms (9.6%)
tracing	17.672 ms [17.494 ms, 17.85 ms]	-807.132 µs (-4.4%)

candidate results

Variant	Request duration [CI 0.99]	Δ no_agent
no_agent	18.132 ms [17.949 ms, 18.315 ms]	-
appsec	19.782 ms [19.578 ms, 19.987 ms]	1.651 ms (9.1%)
code_origins	17.706 ms [17.526 ms, 17.885 ms]	-425.973 µs (-2.3%)
iast	18.686 ms [18.499 ms, 18.873 ms]	554.264 µs (3.1%)
profiling	19.044 ms [18.855 ms, 19.233 ms]	912.239 µs (5.0%)
tracing	17.722 ms [17.547 ms, 17.897 ms]	-409.567 µs (-2.3%)

Dacapo

Parameters

	Baseline	Candidate
baseline_or_candidate	baseline	candidate
git_branch	master	ygree/llmobs-systest-fixes
git_commit_date	1772749357	1773798342
git_commit_sha	`4fd66d4`	`028d64f`
release_version	1.61.0-SNAPSHOT~4fd66d45a9	1.60.0-SNAPSHOT~028d64f1f8

See matching parameters

	Baseline	Candidate
application	biojava	biojava
ci_job_date	1773800489	1773800489
ci_job_id	1515829477	1515829477
ci_pipeline_id	103193634	103193634
cpu_model	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version	Linux runner-zfyrx7zua-project-304-concurrent-2-jqfovg88 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux	Linux runner-zfyrx7zua-project-304-concurrent-2-jqfovg88 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 11 metrics, 1 unstable metrics.

Execution time for biojava

gantt
    title biojava - execution time [CI 0.99] : candidate=1.60.0-SNAPSHOT~028d64f1f8, baseline=1.61.0-SNAPSHOT~4fd66d45a9
    dateFormat X
    axisFormat %s
section baseline
no_agent (15.569 s) : 15569000, 15569000
.   : milestone, 15569000,
appsec (14.612 s) : 14612000, 14612000
.   : milestone, 14612000,
iast (17.88 s) : 17880000, 17880000
.   : milestone, 17880000,
iast_GLOBAL (17.695 s) : 17695000, 17695000
.   : milestone, 17695000,
profiling (14.983 s) : 14983000, 14983000
.   : milestone, 14983000,
tracing (15.028 s) : 15028000, 15028000
.   : milestone, 15028000,
section candidate
no_agent (15.244 s) : 15244000, 15244000
.   : milestone, 15244000,
appsec (15.003 s) : 15003000, 15003000
.   : milestone, 15003000,
iast (17.915 s) : 17915000, 17915000
.   : milestone, 17915000,
iast_GLOBAL (17.946 s) : 17946000, 17946000
.   : milestone, 17946000,
profiling (15.284 s) : 15284000, 15284000
.   : milestone, 15284000,
tracing (15.065 s) : 15065000, 15065000
.   : milestone, 15065000,

baseline results

Variant	Execution Time [CI 0.99]	Δ no_agent
no_agent	15.569 s [15.569 s, 15.569 s]	-
appsec	14.612 s [14.612 s, 14.612 s]	-957.0 ms (-6.1%)
iast	17.88 s [17.88 s, 17.88 s]	2.311 s (14.8%)
iast_GLOBAL	17.695 s [17.695 s, 17.695 s]	2.126 s (13.7%)
profiling	14.983 s [14.983 s, 14.983 s]	-586.0 ms (-3.8%)
tracing	15.028 s [15.028 s, 15.028 s]	-541.0 ms (-3.5%)

candidate results

Variant	Execution Time [CI 0.99]	Δ no_agent
no_agent	15.244 s [15.244 s, 15.244 s]	-
appsec	15.003 s [15.003 s, 15.003 s]	-241.0 ms (-1.6%)
iast	17.915 s [17.915 s, 17.915 s]	2.671 s (17.5%)
iast_GLOBAL	17.946 s [17.946 s, 17.946 s]	2.702 s (17.7%)
profiling	15.284 s [15.284 s, 15.284 s]	40.0 ms (0.3%)
tracing	15.065 s [15.065 s, 15.065 s]	-179.0 ms (-1.2%)

Execution time for tomcat

gantt
    title tomcat - execution time [CI 0.99] : candidate=1.60.0-SNAPSHOT~028d64f1f8, baseline=1.61.0-SNAPSHOT~4fd66d45a9
    dateFormat X
    axisFormat %s
section baseline
no_agent (1.478 ms) : 1467, 1490
.   : milestone, 1478,
appsec (3.781 ms) : 3560, 4003
.   : milestone, 3781,
iast (2.255 ms) : 2186, 2324
.   : milestone, 2255,
iast_GLOBAL (2.305 ms) : 2235, 2374
.   : milestone, 2305,
profiling (2.108 ms) : 2052, 2164
.   : milestone, 2108,
tracing (2.061 ms) : 2007, 2114
.   : milestone, 2061,
section candidate
no_agent (1.477 ms) : 1465, 1489
.   : milestone, 1477,
appsec (3.797 ms) : 3576, 4017
.   : milestone, 3797,
iast (2.261 ms) : 2192, 2330
.   : milestone, 2261,
iast_GLOBAL (2.299 ms) : 2230, 2368
.   : milestone, 2299,
profiling (2.093 ms) : 2037, 2149
.   : milestone, 2093,
tracing (2.068 ms) : 2014, 2121
.   : milestone, 2068,

baseline results

Variant	Execution Time [CI 0.99]	Δ no_agent
no_agent	1.478 ms [1.467 ms, 1.49 ms]	-
appsec	3.781 ms [3.56 ms, 4.003 ms]	2.303 ms (155.8%)
iast	2.255 ms [2.186 ms, 2.324 ms]	776.665 µs (52.5%)
iast_GLOBAL	2.305 ms [2.235 ms, 2.374 ms]	826.634 µs (55.9%)
profiling	2.108 ms [2.052 ms, 2.164 ms]	629.399 µs (42.6%)
tracing	2.061 ms [2.007 ms, 2.114 ms]	582.578 µs (39.4%)

candidate results

Variant	Execution Time [CI 0.99]	Δ no_agent
no_agent	1.477 ms [1.465 ms, 1.489 ms]	-
appsec	3.797 ms [3.576 ms, 4.017 ms]	2.32 ms (157.1%)
iast	2.261 ms [2.192 ms, 2.33 ms]	784.278 µs (53.1%)
iast_GLOBAL	2.299 ms [2.23 ms, 2.368 ms]	822.06 µs (55.7%)
profiling	2.093 ms [2.037 ms, 2.149 ms]	615.699 µs (41.7%)
tracing	2.068 ms [2.014 ms, 2.121 ms]	590.848 µs (40.0%)

…wthTestOpenAiLlmInteractions::test_completion

…teractions::test_chat_completion_tool_call

…d with python openai instrumentation and system-tests

… with variables + chat_template, longest-first overlap handling) and support map-based LLM input serialization (messages + prompt) in LLMObs mapper. Also filter empty instruction messages to match system-test expectations.

…st and return [image] (not empty) when stripped input_image URLs are missing, aligning mixed-input chat_template output with expected behavior.

…output.messages from request params so existing error-span tests pass.

…ol_definitions tags

…JSON argument parsing and remove duplicate manual parsing logic from ResponseDecorator.

ygree · 2026-03-17T22:11:55Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0c879ba692

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

...enai-java-3.0/src/main/java/datadog/trace/instrumentation/openai_java/ResponseDecorator.java

ygree self-assigned this Feb 19, 2026

ygree added comp: mlobs ML Observability (LLMObs) type: bug Bug report and fix labels Feb 19, 2026

llmobs: set model tag even when llmobs disabled

cbd6226

ygree force-pushed the ygree/llmobs-systest-fixes branch from 5cd257e to cbd6226 Compare February 24, 2026 09:31

ygree changed the title ~~llmobs: set model tag even when llmobs disabled~~ fix(llmobs): set model tag even when llmobs disabled Mar 2, 2026

ygree added 23 commits March 2, 2026 13:30

Set metadata.stream tag no matter it's true or false

4f27673

Set chat/completion CACHE_READ_INPUT_TOKENS tag

d128d6b

Set error nad error_type tags

3fc5ceb

Use "" instead of null for the role in CompletionDecorator to comply …

021a9d1

…wthTestOpenAiLlmInteractions::test_completion

Use "" instead of null for the content to comply with TestOpenAiLlmIn…

0637931

…teractions::test_chat_completion_tool_call

Add missing metatadata.tool_choice

0cb41e1

Add missing tool_definitions

a42f8aa

Add source:integration tag

6e10255

Add missing _dd attribute to the llmobs span event

34f3a07

Add missing error tags

a0c1139

Remove error from the llmobs span event. It must be part of meta block

effc343

Add missing meta.text.verbosity

c0e3876

Add summaryText and encrypted_content

b000770

Add missing tool_calls and tool_results for responses

53471a2

Always set stream param to produce the same request body to be aligne…

2207c46

…d with python openai instrumentation and system-tests

Fix OpenAI Responses prompt tracking to use response instructions fir…

7d683b6

…st and return [image] (not empty) when stripped input_image URLs are missing, aligning mixed-input chat_template output with expected behavior.

Set LLMObs error-path defaults in Java to always emit model_name and …

2c17ddc

…output.messages from request params so existing error-span tests pass.

Add OpenAI Responses tool definition extraction to populate LLMObs to…

ad3b782

…ol_definitions tags

Fix ChatCompletionServiceTest

1810327

Extract JsonValueUtils

46221e4

Refactor OpenAI responses instrumentation to reuse ToolCallExtractor …

61ad667

…JSON argument parsing and remove duplicate manual parsing logic from ResponseDecorator.

Fix test assertions

f0957b7

ygree added 5 commits March 6, 2026 10:35

Add integration tag

f3f1f75

Add ddtrace.verion

668e955

Improve test assertions

d57402e

Merge branch 'master' into ygree/llmobs-systest-fixes

a3051e3

Fix format

0c879ba

ygree changed the title ~~fix(llmobs): set model tag even when llmobs disabled~~ fix(llmobs): openai-java payload mapping for responses, tool metadata, and prompt tracking Mar 6, 2026

ygree added tag: ai generated Largely based on code generated by an AI or LLM tag: no release notes Changes to exclude from release notes labels Mar 6, 2026

ygree marked this pull request as ready for review March 6, 2026 13:46

ygree requested review from a team as code owners March 6, 2026 13:46

chatgpt-codex-connector bot reviewed Mar 17, 2026

View reviewed changes

...enai-java-3.0/src/main/java/datadog/trace/instrumentation/openai_java/ResponseDecorator.java Outdated Show resolved Hide resolved

...enai-java-3.0/src/main/java/datadog/trace/instrumentation/openai_java/ResponseDecorator.java Outdated Show resolved Hide resolved

ygree added 2 commits March 17, 2026 17:35

Include input messages when instructions are present in prompt tracking

f4e3a8b

Fix instructions role to system in prompt tracking

028d64f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(llmobs): openai-java payload mapping for responses, tool metadata, and prompt tracking#10644

fix(llmobs): openai-java payload mapping for responses, tool metadata, and prompt tracking#10644
ygree wants to merge 31 commits intomasterfrom
ygree/llmobs-systest-fixes

ygree commented Feb 19, 2026 •

edited

Loading

Uh oh!

pr-commenter bot commented Feb 19, 2026 •

edited

Loading

Uh oh!

ygree commented Mar 17, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ygree commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What Does This Do

Motivation

Additional Notes

Contributor Checklist

Uh oh!

pr-commenter bot commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks

Startup

Parameters

Summary

Load

Parameters

Summary

Dacapo

Parameters

Summary

Uh oh!

ygree commented Mar 17, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ygree commented Feb 19, 2026 •

edited

Loading

pr-commenter bot commented Feb 19, 2026 •

edited

Loading