[DRAFT] NOT MERGING POC: Send JVM runtime metrics via OTLP using OTel-native naming#10985
Draft
[DRAFT] NOT MERGING POC: Send JVM runtime metrics via OTLP using OTel-native naming#10985
Conversation
BenchmarksStartupParameters
See matching parameters
SummaryFound 7 performance improvements and 2 performance regressions! Performance is the same for 48 metrics, 14 unstable metrics.
Startup time reports for insecure-bankgantt
title insecure-bank - global startup overhead: candidate=1.62.0-SNAPSHOT~5a31372eb2, baseline=1.63.0-SNAPSHOT~2e6ce6ce0c
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.067 s) : 0, 1066578
Total [baseline] (8.875 s) : 0, 8874523
Agent [candidate] (1.056 s) : 0, 1055849
Total [candidate] (8.858 s) : 0, 8858408
section iast
Agent [baseline] (1.251 s) : 0, 1251067
Total [baseline] (9.531 s) : 0, 9531313
Agent [candidate] (1.227 s) : 0, 1226568
Total [candidate] (9.585 s) : 0, 9585066
gantt
title insecure-bank - break down per module: candidate=1.62.0-SNAPSHOT~5a31372eb2, baseline=1.63.0-SNAPSHOT~2e6ce6ce0c
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.253 ms) : 0, 1253
crashtracking [candidate] (1.238 ms) : 0, 1238
BytebuddyAgent [baseline] (636.521 ms) : 0, 636521
BytebuddyAgent [candidate] (632.716 ms) : 0, 632716
AgentMeter [baseline] (29.78 ms) : 0, 29780
AgentMeter [candidate] (29.334 ms) : 0, 29334
GlobalTracer [baseline] (247.612 ms) : 0, 247612
GlobalTracer [candidate] (248.568 ms) : 0, 248568
AppSec [baseline] (32.725 ms) : 0, 32725
AppSec [candidate] (32.059 ms) : 0, 32059
Debugger [baseline] (62.309 ms) : 0, 62309
Debugger [candidate] (59.114 ms) : 0, 59114
Remote Config [baseline] (604.935 µs) : 0, 605
Remote Config [candidate] (600.503 µs) : 0, 601
Telemetry [baseline] (8.302 ms) : 0, 8302
Telemetry [candidate] (8.007 ms) : 0, 8007
Flare Poller [baseline] (10.658 ms) : 0, 10658
Flare Poller [candidate] (8.144 ms) : 0, 8144
section iast
crashtracking [baseline] (1.255 ms) : 0, 1255
crashtracking [candidate] (1.242 ms) : 0, 1242
BytebuddyAgent [baseline] (828.567 ms) : 0, 828567
BytebuddyAgent [candidate] (803.32 ms) : 0, 803320
AgentMeter [baseline] (11.463 ms) : 0, 11463
AgentMeter [candidate] (11.421 ms) : 0, 11421
GlobalTracer [baseline] (237.275 ms) : 0, 237275
GlobalTracer [candidate] (239.62 ms) : 0, 239620
AppSec [baseline] (29.982 ms) : 0, 29982
AppSec [candidate] (32.795 ms) : 0, 32795
Debugger [baseline] (64.785 ms) : 0, 64785
Debugger [candidate] (58.882 ms) : 0, 58882
Remote Config [baseline] (538.249 µs) : 0, 538
Remote Config [candidate] (1.133 ms) : 0, 1133
Telemetry [baseline] (7.996 ms) : 0, 7996
Telemetry [candidate] (12.474 ms) : 0, 12474
Flare Poller [baseline] (3.425 ms) : 0, 3425
Flare Poller [candidate] (3.466 ms) : 0, 3466
IAST [baseline] (28.95 ms) : 0, 28950
IAST [candidate] (25.961 ms) : 0, 25961
Startup time reports for petclinicgantt
title petclinic - global startup overhead: candidate=1.62.0-SNAPSHOT~5a31372eb2, baseline=1.63.0-SNAPSHOT~2e6ce6ce0c
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.076 s) : 0, 1076254
Total [baseline] (10.983 s) : 0, 10983203
Agent [candidate] (1.066 s) : 0, 1066244
Total [candidate] (11.096 s) : 0, 11095952
section appsec
Agent [baseline] (1.285 s) : 0, 1285037
Total [baseline] (11.104 s) : 0, 11103645
Agent [candidate] (1.251 s) : 0, 1250934
Total [candidate] (11.204 s) : 0, 11203995
section iast
Agent [baseline] (1.261 s) : 0, 1260761
Total [baseline] (11.176 s) : 0, 11175836
Agent [candidate] (1.223 s) : 0, 1222886
Total [candidate] (11.265 s) : 0, 11265051
section profiling
Agent [baseline] (1.318 s) : 0, 1317790
Total [baseline] (11.092 s) : 0, 11091747
Agent [candidate] (1.186 s) : 0, 1185949
Total [candidate] (11.043 s) : 0, 11042812
gantt
title petclinic - break down per module: candidate=1.62.0-SNAPSHOT~5a31372eb2, baseline=1.63.0-SNAPSHOT~2e6ce6ce0c
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.247 ms) : 0, 1247
crashtracking [candidate] (1.227 ms) : 0, 1227
BytebuddyAgent [baseline] (641.764 ms) : 0, 641764
BytebuddyAgent [candidate] (639.308 ms) : 0, 639308
AgentMeter [baseline] (30.101 ms) : 0, 30101
AgentMeter [candidate] (29.708 ms) : 0, 29708
GlobalTracer [baseline] (250.016 ms) : 0, 250016
GlobalTracer [candidate] (250.424 ms) : 0, 250424
AppSec [baseline] (33.005 ms) : 0, 33005
AppSec [candidate] (32.239 ms) : 0, 32239
Debugger [baseline] (63.326 ms) : 0, 63326
Debugger [candidate] (60.389 ms) : 0, 60389
Remote Config [baseline] (605.784 µs) : 0, 606
Remote Config [candidate] (595.033 µs) : 0, 595
Telemetry [baseline] (9.132 ms) : 0, 9132
Telemetry [candidate] (8.095 ms) : 0, 8095
Flare Poller [baseline] (10.009 ms) : 0, 10009
Flare Poller [candidate] (8.013 ms) : 0, 8013
section appsec
crashtracking [baseline] (1.254 ms) : 0, 1254
crashtracking [candidate] (1.221 ms) : 0, 1221
BytebuddyAgent [baseline] (687.184 ms) : 0, 687184
BytebuddyAgent [candidate] (663.494 ms) : 0, 663494
AgentMeter [baseline] (12.262 ms) : 0, 12262
AgentMeter [candidate] (12.032 ms) : 0, 12032
GlobalTracer [baseline] (251.411 ms) : 0, 251411
GlobalTracer [candidate] (249.139 ms) : 0, 249139
AppSec [baseline] (186.504 ms) : 0, 186504
AppSec [candidate] (185.206 ms) : 0, 185206
Debugger [baseline] (66.511 ms) : 0, 66511
Debugger [candidate] (65.957 ms) : 0, 65957
Remote Config [baseline] (589.473 µs) : 0, 589
Remote Config [candidate] (603.173 µs) : 0, 603
Telemetry [baseline] (7.819 ms) : 0, 7819
Telemetry [candidate] (8.676 ms) : 0, 8676
Flare Poller [baseline] (9.11 ms) : 0, 9110
Flare Poller [candidate] (3.619 ms) : 0, 3619
IAST [baseline] (25.139 ms) : 0, 25139
IAST [candidate] (24.625 ms) : 0, 24625
section iast
crashtracking [baseline] (1.254 ms) : 0, 1254
crashtracking [candidate] (1.229 ms) : 0, 1229
BytebuddyAgent [baseline] (833.972 ms) : 0, 833972
BytebuddyAgent [candidate] (799.854 ms) : 0, 799854
AgentMeter [baseline] (11.559 ms) : 0, 11559
AgentMeter [candidate] (11.385 ms) : 0, 11385
GlobalTracer [baseline] (238.339 ms) : 0, 238339
GlobalTracer [candidate] (239.104 ms) : 0, 239104
AppSec [baseline] (33.14 ms) : 0, 33140
AppSec [candidate] (30.286 ms) : 0, 30286
Debugger [baseline] (65.791 ms) : 0, 65791
Debugger [candidate] (62.172 ms) : 0, 62172
Remote Config [baseline] (553.365 µs) : 0, 553
Remote Config [candidate] (1.685 ms) : 0, 1685
Telemetry [baseline] (8.016 ms) : 0, 8016
Telemetry [candidate] (11.843 ms) : 0, 11843
Flare Poller [baseline] (3.421 ms) : 0, 3421
Flare Poller [candidate] (3.465 ms) : 0, 3465
IAST [baseline] (27.661 ms) : 0, 27661
IAST [candidate] (25.792 ms) : 0, 25792
section profiling
ProfilingAgent [baseline] (93.904 ms) : 0, 93904
ProfilingAgent [candidate] (94.268 ms) : 0, 94268
crashtracking [baseline] (541.999 µs) : 0, 542
crashtracking [candidate] (1.18 ms) : 0, 1180
BytebuddyAgent [baseline] (692.968 ms) : 0, 692968
BytebuddyAgent [candidate] (692.148 ms) : 0, 692148
AgentMeter [baseline] (9.253 ms) : 0, 9253
AgentMeter [candidate] (9.154 ms) : 0, 9154
GlobalTracer [baseline] (209.945 ms) : 0, 209945
GlobalTracer [candidate] (207.41 ms) : 0, 207410
AppSec [baseline] (32.698 ms) : 0, 32698
AppSec [candidate] (32.663 ms) : 0, 32663
Debugger [baseline] (68.175 ms) : 0, 68175
Debugger [candidate] (64.874 ms) : 0, 64874
Remote Config [baseline] (580.504 µs) : 0, 581
Remote Config [candidate] (577.755 µs) : 0, 578
Telemetry [baseline] (8.2 ms) : 0, 8200
Telemetry [candidate] (8.673 ms) : 0, 8673
Flare Poller [baseline] (3.696 ms) : 0, 3696
Flare Poller [candidate] (3.568 ms) : 0, 3568
Profiling [baseline] (94.478 ms) : 0, 94478
Profiling [candidate] (94.847 ms) : 0, 94847
LoadParameters
See matching parameters
SummaryFound 3 performance improvements and 4 performance regressions! Performance is the same for 12 metrics, 17 unstable metrics.
Request duration reports for insecure-bankgantt
title insecure-bank - request duration [CI 0.99] : candidate=1.62.0-SNAPSHOT~5a31372eb2, baseline=1.63.0-SNAPSHOT~2e6ce6ce0c
dateFormat X
axisFormat %s
section baseline
no_agent (1.285 ms) : 1272, 1298
. : milestone, 1285,
iast (3.371 ms) : 3323, 3419
. : milestone, 3371,
iast_FULL (5.928 ms) : 5868, 5989
. : milestone, 5928,
iast_GLOBAL (3.69 ms) : 3628, 3753
. : milestone, 3690,
profiling (2.544 ms) : 2515, 2572
. : milestone, 2544,
tracing (1.937 ms) : 1921, 1953
. : milestone, 1937,
section candidate
no_agent (1.288 ms) : 1275, 1300
. : milestone, 1288,
iast (3.354 ms) : 3309, 3399
. : milestone, 3354,
iast_FULL (6.338 ms) : 6271, 6405
. : milestone, 6338,
iast_GLOBAL (3.588 ms) : 3540, 3637
. : milestone, 3588,
profiling (2.355 ms) : 2332, 2377
. : milestone, 2355,
tracing (1.917 ms) : 1902, 1933
. : milestone, 1917,
Request duration reports for petclinicgantt
title petclinic - request duration [CI 0.99] : candidate=1.62.0-SNAPSHOT~5a31372eb2, baseline=1.63.0-SNAPSHOT~2e6ce6ce0c
dateFormat X
axisFormat %s
section baseline
no_agent (19.593 ms) : 19396, 19790
. : milestone, 19593,
appsec (19.167 ms) : 18967, 19368
. : milestone, 19167,
code_origins (18.111 ms) : 17932, 18291
. : milestone, 18111,
iast (18.345 ms) : 18163, 18528
. : milestone, 18345,
profiling (18.554 ms) : 18373, 18735
. : milestone, 18554,
tracing (17.749 ms) : 17574, 17924
. : milestone, 17749,
section candidate
no_agent (18.249 ms) : 18064, 18435
. : milestone, 18249,
appsec (18.674 ms) : 18488, 18860
. : milestone, 18674,
code_origins (17.675 ms) : 17502, 17847
. : milestone, 17675,
iast (18.069 ms) : 17892, 18247
. : milestone, 18069,
profiling (19.652 ms) : 19456, 19848
. : milestone, 19652,
tracing (18.229 ms) : 18050, 18407
. : milestone, 18229,
DacapoParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 11 metrics, 1 unstable metrics. Execution time for biojavagantt
title biojava - execution time [CI 0.99] : candidate=1.62.0-SNAPSHOT~5a31372eb2, baseline=1.63.0-SNAPSHOT~2e6ce6ce0c
dateFormat X
axisFormat %s
section baseline
no_agent (15.307 s) : 15307000, 15307000
. : milestone, 15307000,
appsec (14.977 s) : 14977000, 14977000
. : milestone, 14977000,
iast (18.453 s) : 18453000, 18453000
. : milestone, 18453000,
iast_GLOBAL (17.953 s) : 17953000, 17953000
. : milestone, 17953000,
profiling (14.948 s) : 14948000, 14948000
. : milestone, 14948000,
tracing (15.056 s) : 15056000, 15056000
. : milestone, 15056000,
section candidate
no_agent (15.07 s) : 15070000, 15070000
. : milestone, 15070000,
appsec (14.834 s) : 14834000, 14834000
. : milestone, 14834000,
iast (18.182 s) : 18182000, 18182000
. : milestone, 18182000,
iast_GLOBAL (18.1 s) : 18100000, 18100000
. : milestone, 18100000,
profiling (14.864 s) : 14864000, 14864000
. : milestone, 14864000,
tracing (14.92 s) : 14920000, 14920000
. : milestone, 14920000,
Execution time for tomcatgantt
title tomcat - execution time [CI 0.99] : candidate=1.62.0-SNAPSHOT~5a31372eb2, baseline=1.63.0-SNAPSHOT~2e6ce6ce0c
dateFormat X
axisFormat %s
section baseline
no_agent (1.499 ms) : 1487, 1511
. : milestone, 1499,
appsec (3.866 ms) : 3644, 4088
. : milestone, 3866,
iast (2.302 ms) : 2232, 2372
. : milestone, 2302,
iast_GLOBAL (2.337 ms) : 2267, 2407
. : milestone, 2337,
profiling (2.116 ms) : 2062, 2171
. : milestone, 2116,
tracing (2.098 ms) : 2044, 2151
. : milestone, 2098,
section candidate
no_agent (1.501 ms) : 1489, 1513
. : milestone, 1501,
appsec (3.854 ms) : 3630, 4078
. : milestone, 3854,
iast (2.297 ms) : 2228, 2366
. : milestone, 2297,
iast_GLOBAL (2.327 ms) : 2258, 2396
. : milestone, 2327,
profiling (2.114 ms) : 2059, 2169
. : milestone, 2114,
tracing (2.113 ms) : 2059, 2167
. : milestone, 2113,
|
e819041 to
4d69327
Compare
link04
added a commit
to DataDog/system-tests
that referenced
this pull request
Apr 9, 2026
New scenario OTLP_RUNTIME_METRICS that sets DD_METRICS_OTEL_ENABLED=true alongside DD_RUNTIME_METRICS_ENABLED=true. Tests verify OTel-native metric names (dotnet.*, jvm.*, go.*, v8js.*) appear in OTLP payloads and that DD-proprietary names (runtime.dotnet.*, runtime.go.*) do not. All languages marked as missing_feature in manifests until POC PRs are merged: - .NET: DataDog/dd-trace-dotnet#8299 - Go: DataDog/dd-trace-go#4611 - Node.js: DataDog/dd-trace-js#7869 - Java: DataDog/dd-trace-java#10985 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
4 tasks
3 tasks
Adds jvm.memory.used, jvm.memory.committed, jvm.memory.limit, jvm.gc.duration, jvm.gc.count, jvm.thread.count, jvm.class.loaded, jvm.class.unloaded, jvm.cpu.recent_utilization, jvm.cpu.count as OTel instruments on the existing OTLP metrics pipeline. Includes jvm.memory.type attribute for heap/non_heap breakdown required by semantic-core equivalence mappings. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…u.utilization, jvm.class.count Aligns with OTel JVM semantic conventions spreadsheet. Updates test to verify all 16 metrics are registered. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Call JvmOtlpRuntimeMetrics.start() from OpenTelemetryMetricsInstrumentation when both DD_METRICS_OTEL_ENABLED and DD_RUNTIME_METRICS_ENABLED are true. Matches .NET/Go/NodeJS config gating pattern. Also updates metrics to match OTel spec exactly (Option B): removes jvm.gc.duration/count, fixes types, adds missing metrics. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
OtlpDataPoint, OtlpLongPoint, OtlpDoublePoint, OtlpMetricVisitor, OtlpMetricsVisitor, OtlpScopedMetricsVisitor moved from datadog.trace.bootstrap.otel.metrics.data/export to datadog.trace.bootstrap.otlp.metrics after PR #11055 merge. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Assert exactly 18 metrics (was missing jvm.file_descriptor.count/limit) - Assert no DD-proprietary names present - Matches .NET PR #8457 test pattern Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Per OTel semconv, jvm.system.cpu.utilization, jvm.system.cpu.load_1m, jvm.file_descriptor.count and jvm.file_descriptor.limit are Opt-In: they MUST only be reported when the instrumentation is explicitly configured to do so. This POC currently has no DD_METRICS_OTEL_OPTIN_ENABLED gating, so emitting them would violate the spec. Removing the three we shipped (jvm.system.cpu.utilization, jvm.file_descriptor.count, jvm.file_descriptor.limit) brings the metric set down to the 15 Recommended JVM metrics. Adding opt-in support is tracked separately and will reintroduce these behind the env var. Reference: https://opentelemetry.io/docs/specs/semconv/general/metric-requirement-level/#opt-in
The previous wiring only fired when the application called OpenTelemetry.getMeterProvider() — which most JVM apps never do unless they themselves use the OTel API. Spring Boot, Akka, Vert.x and the system-tests weblogs don't, so JvmOtlpRuntimeMetrics.start() was never reached and zero jvm.* metrics were exported. Match the pattern Node and .NET use: start runtime metrics from the tracer's own init path (here, Agent.startJmx, the same place startJmxFetch lives). Reflective load through AGENT_CLASSLOADER mirrors startJmxFetch exactly. Gated on DD_RUNTIME_METRICS_ENABLED && OTel metrics enabled, both already set when the OTLP_RUNTIME_METRICS scenario runs. The OTLP exporter pipeline (OtlpMetricsService) is already started by CoreTracer when DD_METRICS_OTEL_ENABLED=true; this change only registers the JVM metric callbacks with OtelMeterProvider so the periodic export has something to collect.
Two changes:
1. Strip 4 more JVM metrics that aren't Recommended per OTel semconv:
jvm.memory.init, jvm.buffer.memory.used, jvm.buffer.memory.limit,
jvm.buffer.count. They live in jvm-metrics-experimental.yaml with
stability: development. Same partitioning OTel Java upstream applies
in JmxRuntimeMetricsFactory's emitExperimentalTelemetry branch:
non-experimental subset is exactly the 11 we now ship. Will return
behind a future DD_METRICS_OTEL_OPTIN_ENABLED flag together with
the previously-dropped Opt-In metrics (jvm.system.cpu.*,
jvm.file_descriptor.*).
Reference:
https://opentelemetry.io/docs/specs/semconv/runtime/jvm-metrics/
https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/instrumentation/runtime-telemetry/library/src/main/java/io/opentelemetry/instrumentation/runtimetelemetry/internal/JmxRuntimeMetricsFactory.java
2. Migrate JvmOtlpRuntimeMetricsTest from Spock/Groovy to JUnit 5 Java
per the new-Groovy-file enforcement workflow. Same 4 test cases,
identical assertion semantics. Note: the opentelemetry-1.47 module
still has 2 other pre-existing .groovy test files, so the module is
not added to .github/g2j-migrated-modules.txt yet.
Calling startOtlpRuntimeMetrics() from startJmx() inherits its 15-second jitter delay, which races with short-lived test scenarios — for slower weblogs (ratpack hit this), OtelMetricRegistry callbacks are registered after the OtlpMetricsService has already done its first scheduled flush, so the agent receives 0 jvm.* metrics for the test's wait window. The runtime metric callbacks don't actually need JMX initialized — they read ManagementFactory directly. Move the call into installDatadogTracer right after CoreTracer (and OtlpMetricsService) are up. Same effective gating (DD_RUNTIME_METRICS_ENABLED + DD_METRICS_OTEL_ENABLED), no delay.
5a31372 to
67d91a9
Compare
Agent.installDatadogTracer now triggers runtime metric registration on agent boot (the canonical entry point), so the duplicate call from the OpenTelemetry.getMeterProvider advice is dead code. JvmOtlpRuntimeMetrics is already idempotent so removing the second invocation is a no-op behaviorally; this just reduces review surface.
Contributor
|
| Suite | Status |
|---|---|
| Startup | |
| Load | |
| DaCapo |
Commit: 67d91a9d · CI Pipeline · Benchmarking Platform UI
Mirrors OTel Java upstream's runtime-telemetry reflect-config.json: GraalVM AOT analysis can't see Class.forName(...) lookups through AGENT_CLASSLOADER, so the class gets dead-stripped from the native binary unless explicitly listed. Without this entry, the Agent.startOtlpRuntimeMetrics() reflection in installDatadogTracer fails on native-image with ClassNotFoundException and zero jvm.* metrics are emitted.
Ship the 4 metrics that are stability:development per OTel semconv but NOT marked Opt-In: jvm.memory.init, jvm.buffer.memory.used, jvm.buffer.memory.limit, jvm.buffer.count. Final set is 15 metrics — everything OTel ships except the 4 explicit Opt-In ones (jvm.system.cpu.*, jvm.file_descriptor.*).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Sends jvm.memory.used, jvm.gc.duration, jvm.thread.count, jvm.class.loaded, jvm.cpu.recent_utilization via OTLP. Related: DataDog/dd-trace-dotnet#8299
🤖 Generated with Claude Code