Avoid ArrayList copying from TraceInterceptors#10828
Avoid ArrayList copying from TraceInterceptors#10828gh-worker-dd-mergequeue-cf854d[bot] merged 13 commits intomasterfrom
Conversation
Introduced TraceList which is just an ArrayList that exposes the internal modCount Being able to access the modCount allows interceptCompleteTrace to check if the list post-interception has been modified If the list is the same & is unmodified, then interceptCompleteTrace is able to bypass making a defensive copy As a further optimization, use of TraceList is pushed up to callers of interceptCompleteTrace and TraceCollector.write That does mean StreamingTraceCollector can no longer used a singletonList, but the savings throughput the pipeline outweigh the cost of the extra Object[] in that case
| // There is still a risk that a TraceInterceptor holds onto the provided | ||
| // List and modifies it later on, but we cannot safeguard against | ||
| // every possible misuse. | ||
| Collection<? extends MutableSpan> interceptedTrace = originalTrace; |
There was a problem hiding this comment.
Previously, there was a defensive copy of the incoming trace; however, I couldn't see a reason why. The incoming List was already copied from the span buffer. And creating a new ArrayList didn't prevent mutation by the TraceInterceptor, since the TraceInterceptor could simply change the incoming list and return it anyway.
| } | ||
| healthMetrics.onFinishSpan(); | ||
| tracer.write(Collections.singletonList(span)); | ||
| tracer.write(TraceList.of(span)); |
There was a problem hiding this comment.
This part is slightly questionable, since SpanList (being an ArrayList) does incur an extra Object[] allocation that isn't needed for singletonList. I think this is offset by the efficiency gains elsewhere, but I should verify more carefully.
BenchmarksStartupParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 61 metrics, 10 unstable metrics. Startup time reports for insecure-bankgantt
title insecure-bank - global startup overhead: candidate=1.61.0-SNAPSHOT~e6a612fa90, baseline=1.61.0-SNAPSHOT~79fbbef465
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.067 s) : 0, 1066934
Total [baseline] (8.878 s) : 0, 8877525
Agent [candidate] (1.063 s) : 0, 1062963
Total [candidate] (8.823 s) : 0, 8823397
section iast
Agent [baseline] (1.237 s) : 0, 1236589
Total [baseline] (9.551 s) : 0, 9551210
Agent [candidate] (1.227 s) : 0, 1226724
Total [candidate] (9.585 s) : 0, 9584561
gantt
title insecure-bank - break down per module: candidate=1.61.0-SNAPSHOT~e6a612fa90, baseline=1.61.0-SNAPSHOT~79fbbef465
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.217 ms) : 0, 1217
crashtracking [candidate] (1.204 ms) : 0, 1204
BytebuddyAgent [baseline] (632.446 ms) : 0, 632446
BytebuddyAgent [candidate] (630.29 ms) : 0, 630290
AgentMeter [baseline] (29.252 ms) : 0, 29252
AgentMeter [candidate] (29.117 ms) : 0, 29117
GlobalTracer [baseline] (258.471 ms) : 0, 258471
GlobalTracer [candidate] (258.03 ms) : 0, 258030
AppSec [baseline] (31.946 ms) : 0, 31946
AppSec [candidate] (31.704 ms) : 0, 31704
Debugger [baseline] (59.884 ms) : 0, 59884
Debugger [candidate] (59.688 ms) : 0, 59688
Remote Config [baseline] (593.335 µs) : 0, 593
Remote Config [candidate] (600.221 µs) : 0, 600
Telemetry [baseline] (8.104 ms) : 0, 8104
Telemetry [candidate] (8.077 ms) : 0, 8077
Flare Poller [baseline] (8.773 ms) : 0, 8773
Flare Poller [candidate] (8.056 ms) : 0, 8056
section iast
crashtracking [baseline] (1.223 ms) : 0, 1223
crashtracking [candidate] (1.2 ms) : 0, 1200
BytebuddyAgent [baseline] (803.219 ms) : 0, 803219
BytebuddyAgent [candidate] (795.94 ms) : 0, 795940
AgentMeter [baseline] (11.5 ms) : 0, 11500
AgentMeter [candidate] (11.32 ms) : 0, 11320
GlobalTracer [baseline] (248.989 ms) : 0, 248989
GlobalTracer [candidate] (247.109 ms) : 0, 247109
IAST [baseline] (25.557 ms) : 0, 25557
IAST [candidate] (25.305 ms) : 0, 25305
AppSec [baseline] (26.8 ms) : 0, 26800
AppSec [candidate] (27.297 ms) : 0, 27297
Debugger [baseline] (69.267 ms) : 0, 69267
Debugger [candidate] (68.099 ms) : 0, 68099
Remote Config [baseline] (524.734 µs) : 0, 525
Remote Config [candidate] (536.057 µs) : 0, 536
Telemetry [baseline] (9.678 ms) : 0, 9678
Telemetry [candidate] (10.218 ms) : 0, 10218
Flare Poller [baseline] (3.501 ms) : 0, 3501
Flare Poller [candidate] (3.626 ms) : 0, 3626
Startup time reports for petclinicgantt
title petclinic - global startup overhead: candidate=1.61.0-SNAPSHOT~e6a612fa90, baseline=1.61.0-SNAPSHOT~79fbbef465
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.063 s) : 0, 1063130
Total [baseline] (11.043 s) : 0, 11042923
Agent [candidate] (1.057 s) : 0, 1057108
Total [candidate] (11.047 s) : 0, 11047284
section appsec
Agent [baseline] (1.251 s) : 0, 1250571
Total [baseline] (11.273 s) : 0, 11272725
Agent [candidate] (1.248 s) : 0, 1247660
Total [candidate] (11.187 s) : 0, 11186765
section iast
Agent [baseline] (1.228 s) : 0, 1227927
Total [baseline] (11.36 s) : 0, 11360038
Agent [candidate] (1.227 s) : 0, 1227230
Total [candidate] (11.303 s) : 0, 11303061
section profiling
Agent [baseline] (1.182 s) : 0, 1181900
Total [baseline] (11.023 s) : 0, 11022902
Agent [candidate] (1.182 s) : 0, 1181799
Total [candidate] (11.067 s) : 0, 11066922
gantt
title petclinic - break down per module: candidate=1.61.0-SNAPSHOT~e6a612fa90, baseline=1.61.0-SNAPSHOT~79fbbef465
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.202 ms) : 0, 1202
crashtracking [candidate] (1.189 ms) : 0, 1189
BytebuddyAgent [baseline] (629.787 ms) : 0, 629787
BytebuddyAgent [candidate] (627.72 ms) : 0, 627720
AgentMeter [baseline] (29.13 ms) : 0, 29130
AgentMeter [candidate] (29.079 ms) : 0, 29079
GlobalTracer [baseline] (257.769 ms) : 0, 257769
GlobalTracer [candidate] (256.978 ms) : 0, 256978
AppSec [baseline] (31.8 ms) : 0, 31800
AppSec [candidate] (31.651 ms) : 0, 31651
Debugger [baseline] (60.607 ms) : 0, 60607
Debugger [candidate] (60.021 ms) : 0, 60021
Remote Config [baseline] (592.75 µs) : 0, 593
Remote Config [candidate] (585.26 µs) : 0, 585
Telemetry [baseline] (8.149 ms) : 0, 8149
Telemetry [candidate] (7.957 ms) : 0, 7957
Flare Poller [baseline] (8.026 ms) : 0, 8026
Flare Poller [candidate] (5.875 ms) : 0, 5875
section appsec
crashtracking [baseline] (1.197 ms) : 0, 1197
crashtracking [candidate] (1.204 ms) : 0, 1204
BytebuddyAgent [baseline] (658.949 ms) : 0, 658949
BytebuddyAgent [candidate] (658.459 ms) : 0, 658459
AgentMeter [baseline] (12.146 ms) : 0, 12146
AgentMeter [candidate] (12.055 ms) : 0, 12055
GlobalTracer [baseline] (259.679 ms) : 0, 259679
GlobalTracer [candidate] (258.671 ms) : 0, 258671
AppSec [baseline] (178.44 ms) : 0, 178440
AppSec [candidate] (177.813 ms) : 0, 177813
Debugger [baseline] (66.76 ms) : 0, 66760
Debugger [candidate] (65.463 ms) : 0, 65463
Remote Config [baseline] (617.824 µs) : 0, 618
Remote Config [candidate] (625.024 µs) : 0, 625
Telemetry [baseline] (8.382 ms) : 0, 8382
Telemetry [candidate] (9.181 ms) : 0, 9181
Flare Poller [baseline] (3.611 ms) : 0, 3611
Flare Poller [candidate] (3.609 ms) : 0, 3609
IAST [baseline] (24.399 ms) : 0, 24399
IAST [candidate] (24.214 ms) : 0, 24214
section iast
crashtracking [baseline] (1.207 ms) : 0, 1207
crashtracking [candidate] (1.193 ms) : 0, 1193
BytebuddyAgent [baseline] (795.968 ms) : 0, 795968
BytebuddyAgent [candidate] (795.484 ms) : 0, 795484
AgentMeter [baseline] (11.343 ms) : 0, 11343
AgentMeter [candidate] (11.352 ms) : 0, 11352
GlobalTracer [baseline] (247.568 ms) : 0, 247568
GlobalTracer [candidate] (247.23 ms) : 0, 247230
AppSec [baseline] (26.532 ms) : 0, 26532
AppSec [candidate] (26.38 ms) : 0, 26380
Debugger [baseline] (70.786 ms) : 0, 70786
Debugger [candidate] (71.202 ms) : 0, 71202
Remote Config [baseline] (532.726 µs) : 0, 533
Remote Config [candidate] (543.899 µs) : 0, 544
Telemetry [baseline] (9.219 ms) : 0, 9219
Telemetry [candidate] (9.191 ms) : 0, 9191
Flare Poller [baseline] (3.338 ms) : 0, 3338
Flare Poller [candidate] (3.35 ms) : 0, 3350
IAST [baseline] (25.299 ms) : 0, 25299
IAST [candidate] (25.24 ms) : 0, 25240
section profiling
ProfilingAgent [baseline] (93.539 ms) : 0, 93539
ProfilingAgent [candidate] (93.584 ms) : 0, 93584
crashtracking [baseline] (1.188 ms) : 0, 1188
crashtracking [candidate] (1.181 ms) : 0, 1181
BytebuddyAgent [baseline] (682.249 ms) : 0, 682249
BytebuddyAgent [candidate] (682.71 ms) : 0, 682710
AgentMeter [baseline] (8.656 ms) : 0, 8656
AgentMeter [candidate] (8.595 ms) : 0, 8595
GlobalTracer [baseline] (215.458 ms) : 0, 215458
GlobalTracer [candidate] (215.211 ms) : 0, 215211
AppSec [baseline] (32.214 ms) : 0, 32214
AppSec [candidate] (32.168 ms) : 0, 32168
Debugger [baseline] (62.669 ms) : 0, 62669
Debugger [candidate] (64.249 ms) : 0, 64249
Remote Config [baseline] (582.664 µs) : 0, 583
Remote Config [candidate] (577.042 µs) : 0, 577
Telemetry [baseline] (10.021 ms) : 0, 10021
Telemetry [candidate] (9.189 ms) : 0, 9189
Flare Poller [baseline] (4.251 ms) : 0, 4251
Flare Poller [candidate] (3.479 ms) : 0, 3479
Profiling [baseline] (94.114 ms) : 0, 94114
Profiling [candidate] (94.146 ms) : 0, 94146
LoadParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 19 metrics, 17 unstable metrics. Request duration reports for petclinicgantt
title petclinic - request duration [CI 0.99] : candidate=1.61.0-SNAPSHOT~e6a612fa90, baseline=1.61.0-SNAPSHOT~79fbbef465
dateFormat X
axisFormat %s
section baseline
no_agent (18.325 ms) : 18134, 18515
. : milestone, 18325,
appsec (18.589 ms) : 18403, 18776
. : milestone, 18589,
code_origins (17.912 ms) : 17731, 18093
. : milestone, 17912,
iast (17.714 ms) : 17536, 17891
. : milestone, 17714,
profiling (18.605 ms) : 18423, 18788
. : milestone, 18605,
tracing (17.458 ms) : 17286, 17630
. : milestone, 17458,
section candidate
no_agent (18.191 ms) : 18003, 18379
. : milestone, 18191,
appsec (18.539 ms) : 18353, 18725
. : milestone, 18539,
code_origins (17.682 ms) : 17508, 17857
. : milestone, 17682,
iast (17.56 ms) : 17384, 17736
. : milestone, 17560,
profiling (18.658 ms) : 18468, 18847
. : milestone, 18658,
tracing (17.547 ms) : 17375, 17719
. : milestone, 17547,
Request duration reports for insecure-bankgantt
title insecure-bank - request duration [CI 0.99] : candidate=1.61.0-SNAPSHOT~e6a612fa90, baseline=1.61.0-SNAPSHOT~79fbbef465
dateFormat X
axisFormat %s
section baseline
no_agent (1.182 ms) : 1170, 1193
. : milestone, 1182,
iast (3.183 ms) : 3143, 3222
. : milestone, 3183,
iast_FULL (5.994 ms) : 5933, 6055
. : milestone, 5994,
iast_GLOBAL (3.522 ms) : 3470, 3573
. : milestone, 3522,
profiling (2.19 ms) : 2168, 2213
. : milestone, 2190,
tracing (1.795 ms) : 1780, 1810
. : milestone, 1795,
section candidate
no_agent (1.183 ms) : 1171, 1194
. : milestone, 1183,
iast (3.119 ms) : 3075, 3162
. : milestone, 3119,
iast_FULL (6.003 ms) : 5942, 6063
. : milestone, 6003,
iast_GLOBAL (3.694 ms) : 3626, 3763
. : milestone, 3694,
profiling (2.151 ms) : 2131, 2171
. : milestone, 2151,
tracing (1.806 ms) : 1792, 1820
. : milestone, 1806,
DacapoParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 11 metrics, 1 unstable metrics. Execution time for tomcatgantt
title tomcat - execution time [CI 0.99] : candidate=1.61.0-SNAPSHOT~e6a612fa90, baseline=1.61.0-SNAPSHOT~79fbbef465
dateFormat X
axisFormat %s
section baseline
no_agent (1.466 ms) : 1455, 1477
. : milestone, 1466,
appsec (3.785 ms) : 3565, 4005
. : milestone, 3785,
iast (2.245 ms) : 2175, 2314
. : milestone, 2245,
iast_GLOBAL (2.291 ms) : 2221, 2361
. : milestone, 2291,
profiling (2.095 ms) : 2038, 2151
. : milestone, 2095,
tracing (2.053 ms) : 1999, 2107
. : milestone, 2053,
section candidate
no_agent (1.474 ms) : 1462, 1485
. : milestone, 1474,
appsec (3.776 ms) : 3555, 3997
. : milestone, 3776,
iast (2.247 ms) : 2177, 2316
. : milestone, 2247,
iast_GLOBAL (2.281 ms) : 2212, 2351
. : milestone, 2281,
profiling (2.064 ms) : 2009, 2118
. : milestone, 2064,
tracing (2.047 ms) : 1994, 2101
. : milestone, 2047,
Execution time for biojavagantt
title biojava - execution time [CI 0.99] : candidate=1.61.0-SNAPSHOT~e6a612fa90, baseline=1.61.0-SNAPSHOT~79fbbef465
dateFormat X
axisFormat %s
section baseline
no_agent (15.582 s) : 15582000, 15582000
. : milestone, 15582000,
appsec (14.731 s) : 14731000, 14731000
. : milestone, 14731000,
iast (18.126 s) : 18126000, 18126000
. : milestone, 18126000,
iast_GLOBAL (17.821 s) : 17821000, 17821000
. : milestone, 17821000,
profiling (14.892 s) : 14892000, 14892000
. : milestone, 14892000,
tracing (14.816 s) : 14816000, 14816000
. : milestone, 14816000,
section candidate
no_agent (15.382 s) : 15382000, 15382000
. : milestone, 15382000,
appsec (14.93 s) : 14930000, 14930000
. : milestone, 14930000,
iast (18.326 s) : 18326000, 18326000
. : milestone, 18326000,
iast_GLOBAL (17.88 s) : 17880000, 17880000
. : milestone, 17880000,
profiling (14.875 s) : 14875000, 14875000
. : milestone, 14875000,
tracing (15.159 s) : 15159000, 15159000
. : milestone, 15159000,
|
…Dog/dd-trace-java into dougqh/trace-intercept-optimization
Refactored the code to clean-up the edge case handling, quick return on interceptor empty and trace empty. That reduced the nesting level and made the code easier to read. Also flipped the code that handles unaltered lists, that reads a bit better. Added tests for all these cases to CoreTracerTest2. To make this easier to test in isolation, created a static interceptCompleteTrace that can be called directly by the test methods.
| } | ||
|
|
||
| private final RatelimitedLogger rlLog = new RatelimitedLogger(log, 1, MINUTES); | ||
| private static final RatelimitedLogger rlLog = new RatelimitedLogger(log, 1, MINUTES); |
There was a problem hiding this comment.
Not sure why this wasn't static before. I made it static, so I could call it from the static version of interceptCompleteTrace.
| return interceptCompleteTrace(interceptors, originalTrace); | ||
| } | ||
|
|
||
| static final List<DDSpan> interceptCompleteTrace( |
There was a problem hiding this comment.
Introduced for ease of testing
| trace = new ArrayList<>(interceptedTrace.size()); | ||
| if (interceptedTrace == null || interceptedTrace.isEmpty()) { | ||
| return SpanList.EMPTY; | ||
| } else if (interceptedTrace == originalTrace && originalTrace.modCount() == originalModCount) { |
There was a problem hiding this comment.
Honestly, mod count checking is probably overkill, but I'm erring on the side of caution.
There was a problem hiding this comment.
What would be the risks of removing the size check? Afterall it will be a different object if != 1 so I would have simplified perhaps
There was a problem hiding this comment.
My main thinking was that it reduces the lifecycle of the empty list by substituting it with the canonical empty list. Honestly, I don't think it makes much difference either way.
amarziali
left a comment
There was a problem hiding this comment.
thanks for the improvement. I left a couple of minors.
| trace = new ArrayList<>(interceptedTrace.size()); | ||
| if (interceptedTrace == null || interceptedTrace.isEmpty()) { | ||
| return SpanList.EMPTY; | ||
| } else if (interceptedTrace == originalTrace && originalTrace.modCount() == originalModCount) { |
There was a problem hiding this comment.
What would be the risks of removing the size check? Afterall it will be a different object if != 1 so I would have simplified perhaps
| static final SpanList EMPTY = new SpanList(0); | ||
|
|
||
| static final SpanList of(DDSpan span) { | ||
| SpanList list = new SpanList(1); |
There was a problem hiding this comment.
This factory method appears optimized for handling single-span traces, as reflected by its growth pattern (1 → 2 → 3 → 4 → 6 → 9 → 13 → …). This is generally acceptable since traces are not expected to grow significantly when passing through interceptors, and in cases with multiple spans, the trace size is typically known and used to initialize the list appropriately.
That said, it might be worth documenting this assumption in the method’s Javadoc for future maintainability and clarity.
There was a problem hiding this comment.
Yes, it isn't really intended to be used as a generic factory -- mostly just as an easy substitute for singletonList. That said, I'm happy to add some Javadoc to it.
|
/merge |
|
View all feedbacks in Devflow UI.
The expected merge time in
|
What Does This Do
Avoid ArrayList allocations from TraceInterceptors
CoreTracer.interceptCompleteTraceuses a new classSpanList.SpanListis just anArrayListthat exposes the internalmodCount.By checking if the
Listreturned byTraceInterceptoris the originalSpanListand checking thatmodCountis unchanged,interceptCompleteTracecan determine if the list was altered.If the
Listis unaltered, theninterceptCompleteTracecan skip making a copy into anotherArrayList/SpanList.Motivation
In span creation stress test, 16 threads x 10,000,000 traces x 2 spans, this change saves...
Object[]- 7 GiBArrayList(nowSpanList) - 5 GiBArrayList$Itr- 5 GiBA 5% decrease in allocation and 5-6% decrease in GC time
Additional Notes
Contributor Checklist
type:and (comp:orinst:) labels in addition to any other useful labelsclose,fix, or any linking keywords when referencing an issueUse
solvesinstead, and assign the PR milestone to the issueJira ticket: [PROJ-IDENT]
Note: Once your PR is ready to merge, add it to the merge queue by commenting
/merge./merge -ccancels the queue request./merge -f --reason "reason"skips all merge queue checks; please use this judiciously, as some checks do not run at the PR-level. For more information, see this doc.