Skip to content

perf: allocation-free Val.Str extractor (~25% faster match)#885

Open
He-Pin wants to merge 1 commit into
databricks:masterfrom
He-Pin:perf/val-str-unapply-zero-alloc
Open

perf: allocation-free Val.Str extractor (~25% faster match)#885
He-Pin wants to merge 1 commit into
databricks:masterfrom
He-Pin:perf/val-str-unapply-zero-alloc

Conversation

@He-Pin
Copy link
Copy Markdown
Contributor

@He-Pin He-Pin commented May 31, 2026

Motivation

Val.Str.unapply returned Some((pos, str)), so every case Val.Str(p, s) match — 115 sites across the evaluator, stdlib, and materializer — went through an Option + Tuple2 layer. JVM C2 escape analysis already scalar-replaces those short-lived objects in tight loops (heap allocation is ~0), but the extra Option/Tuple indirection still costs instructions per match.

Modification

  • Rewrite Str.unapply as an allocation-free name-based extractor returning a value class StrExtract(self: Str) with isEmpty/get, plus _1/_2 accessors on Str. The AnyVal result is consumed by the match desugaring without allocation, and the Str type test keeps the match refutable, so the AsciiSafeStr subclass is matched exactly as before. All 115 call sites are unchanged. StrExtract/unapply are private[sjsonnet].
  • Add StrMatchBenchmark — a JMH micro that isolates the extractor in a tight loop (mixing the AsciiSafeStr subclass) as a regression guard.

Result

Isolated micro (StrMatchBenchmark, 1024 matches/op, -f4 -wi10 -i15 -r2 -prof gc, 60 samples):

ns/op gc.alloc.rate.norm
baseline 440.7 ± 2.8 ~0.002 B/op
this PR 331.9 ± 2.9 ~0.001 B/op

A reproducible ~25% (1.33×) speedup on the match operation. Note both baseline and new allocate ~0 B/op — C2 EA already removed the heap object, so the win is instruction count, not allocation.

End-to-end (MainBenchmark, stdlib.jsonnet) is within noise, since Val.Str matching is a small fraction of total parse + eval + materialize work — the per-op win is real but diluted.

Compiles on Scala 3.3.7 / 2.13.18 / 2.12.21; full JVM test suite green; zero behavior change.

Motivation:
`Val.Str.unapply` returned `Some((pos, str))`, so every `case Val.Str(p, s)`
match (115 sites across the evaluator, stdlib, and materializer) went through an
`Option` + `Tuple2` layer. Even though JVM C2 escape analysis scalar-replaces
those short-lived objects in tight loops (so heap allocation is already ~0), the
extra Option/Tuple indirection still costs instructions per match.

Modification:
- Rewrite `Str.unapply` as an allocation-free name-based extractor returning a
  value class `StrExtract(self: Str)` (`isEmpty`/`get`), with `_1`/`_2` accessors
  on `Str`. The `AnyVal` result is consumed by the match desugaring without
  allocation, and the `Str` type test keeps the match refutable so the
  `AsciiSafeStr` subclass is matched exactly as before. All 115 call sites are
  unchanged. `StrExtract`/`unapply` are `private[sjsonnet]`.
- Add `StrMatchBenchmark`, a JMH micro that isolates the extractor in a tight loop
  (mixing `AsciiSafeStr`) as a regression guard.

Result:
Isolated micro (1024 matches/op, -f4, 60 samples): 440.7 ± 2.8 ns/op -> 331.9 ±
2.9 ns/op, a reproducible ~25% (1.33x) speedup; both baseline and new allocate
~0 B/op (EA already removed the heap object — the win is instruction count).
End-to-end (MainBenchmark) is within noise since Val.Str matching is a small
fraction of total parse+eval+materialize work. Compiles on Scala 3.3.7 / 2.13.18
/ 2.12.21; full JVM test suite green; zero behavior change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant