Skip to content

perf: allocation-free ASCII bitmask for std.stripChars (1.65x faster) (#851)#887

Open
He-Pin wants to merge 1 commit into
databricks:masterfrom
He-Pin:perf/strip-ascii-bitmask-851
Open

perf: allocation-free ASCII bitmask for std.stripChars (1.65x faster) (#851)#887
He-Pin wants to merge 1 commit into
databricks:masterfrom
He-Pin:perf/strip-ascii-bitmask-851

Conversation

@He-Pin
Copy link
Copy Markdown
Contributor

@He-Pin He-Pin commented May 31, 2026

Motivation

std.stripChars / lstripChars / rstripChars built a java.util.BitSet per call for multi-character strip sets — allocating a BitSet object plus its backing long[] on every invocation, and paying an array-load + bounds check per membership test. The vast majority of real strip sets are ASCII (whitespace / punctuation).

Modification

  • In StripUtils.strip, detect an all-ASCII strip set while scanning chars and build a 128-bit membership mask in two longs (zero allocation). Strip via stripAsciiMask / inAsciiMask, testing membership with a shift + mask and no array access. Falls back to the existing BitSet path for BMP>127 sets and to the codepoint set for surrogates. Behavior is unchanged.
  • Add StripBenchmark (JMH) as a regression guard; relax StripUtils to private[sjsonnet] so the bench module can exercise it in isolation.

Result

Isolated JMH micro (StripBenchmark, all-ASCII set, long leading/trailing runs, -f4, 48 samples):

ns/op gc.alloc.rate.norm
baseline (BitSet) 2302.8 ± 54.0 104 B/op
this PR (mask) 1394.1 ± 19.7 48 B/op

1.65× faster, −54% allocation. The 56 B/op drop is exactly the removed per-call BitSet (object + long[]); the remaining 48 B is just the result substring.

Behavior verified identical to official jsonnet v0.22.0 across ASCII, multi-char, l/r, tab/newline, and non-ASCII fallback cases. Compiles on Scala 3.3.7 / 2.13.18 / 2.12.21; full JVM test suite green.

Addresses #851. (The residual gap vs jrsonnet noted in the issue is structural — UTF-16 vs byte-native strings — and out of scope here.)

…databricks#851)

Motivation:
std.stripChars/lstripChars/rstripChars built a `java.util.BitSet` per call for multi-character
strip sets, allocating a BitSet object plus its backing `long[]` on every invocation and paying an
array-load + bounds check per membership test. The vast majority of real strip sets are ASCII
(whitespace/punctuation).

Modification:
- In StripUtils.strip, detect an all-ASCII strip set while scanning `chars` and build a 128-bit
  membership mask in two `long`s (no allocation). Strip via `stripAsciiMask`/`inAsciiMask`, which
  test membership with a shift+mask and no array access. Falls back to the existing BitSet path for
  BMP>127 sets and to the codepoint set for surrogates. Behavior is unchanged.
- Add `StripBenchmark` (JMH) as a regression guard; relax `StripUtils` to `private[sjsonnet]` so the
  bench module can exercise it in isolation.

Result:
Isolated JMH micro (StripBenchmark, all-ASCII set, long leading/trailing runs, -f4, 48 samples):
2302.8 +/- 54.0 ns/op -> 1394.1 +/- 19.7 ns/op (1.65x faster), and gc.alloc.rate.norm
104 -> 48 B/op (the removed per-call BitSet; the remaining 48 B is just the result substring).
Behavior verified identical to official jsonnet v0.22.0 across ASCII, multi-char, l/r, tab/newline,
and non-ASCII fallback cases. Compiles on Scala 3.3.7 / 2.13.18 / 2.12.21; full JVM suite green.

References: databricks#851
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant