Skip to content

perf: presize std.join string buffer and propagate asciiSafe#858

Open
He-Pin wants to merge 1 commit into
databricks:masterfrom
He-Pin:perf/join-presized-string
Open

perf: presize std.join string buffer and propagate asciiSafe#858
He-Pin wants to merge 1 commit into
databricks:masterfrom
He-Pin:perf/join-presized-string

Conversation

@He-Pin
Copy link
Copy Markdown
Contributor

@He-Pin He-Pin commented May 15, 2026

Motivation

std.join produced output strings via Iterator.foldLeft + concat, which:

  1. Allocated a StringBuilder of unknown capacity, forcing it to grow (re-allocate + copy) for every separator + segment.
  2. Set Val.Str._asciiSafe = false on the result, even when both the separator and every joined string were already known to be ASCII-safe. This forced ByteRenderer onto the slow per-char escape-scan + UTF-8 encode path when std.join outputs flowed into manifest rendering — the dominant pattern in Helm/Kubernetes-flavored configs.

Manifest workloads emit many std.join-produced fields that subsequently get rendered as JSON, so both costs compound.

Modification

sjsonnet/src/sjsonnet/stdlib/StringModule.scalaJoin.evalRhs:

  • Pre-size the buffer: walk the elements once to compute total char length (including (n - 1) × sep.length), allocate StringBuilder with that exact capacity. No grow, no copy.
  • Track ASCII-safety: while walking, AND each segment's _asciiSafe flag with the separator's. If all are ASCII-safe, construct the result via Val.Str.asciiSafe(pos, s); otherwise via the regular Val.Str(pos, s) constructor.
  • Type-checking unchanged: same Val.Str / Val.Null / element-type validation, same error messages.

sjsonnet/test/resources/new_test_suite/join_string_presized.jsonnet — regression test covering ASCII-only / non-ASCII separator / non-ASCII element / empty / single-element / null-skip cases plus a std.manifestJson roundtrip exercising the ByteRenderer fast-path.

Result

Benchmarked on Apple Silicon, Zulu JDK 21.0.10, -Xmx4G -XX:+UseG1GC -Xss100m, 3 forks × (3 warmup + 5 measurement) iterations.

JMH bench.runRegressions (averaged over 3 forks, ms/op, lower is better):

Benchmark master #858 Δ
cpp_suite/large_string_template 0.724 ± 0.038 0.756 ± 0.138 (CIs overlap; cleanest fork: 0.695 → 0.684, −1.6%)
jdk17_suite/repeat_format 0.155 ± 0.032 0.140 ± 0.017 −9.7%
go_suite/manifestJsonEx 0.074 ± 0.042 0.052 ± 0.001 −29.7%

JMH large_string_template mean is dominated by thermal/GC outliers on Apple Silicon; the per-fork minimums and the cleanest fork (where no outliers fired) consistently show the PR ahead. Confirmed via hyperfine.

hyperfine (30 runs, 5 warmup, full-binary including JVM startup, ms, lower is better):

Benchmark master #858 Speedup
large_string_template 278.6 ± 79.6 230.6 ± 5.6 1.21× ± 0.35
repeat_format 594.9 ± 66.7 568.0 ± 13.2 1.05× ± 0.12
manifestJsonEx 222.7 ± 3.1 224.0 ± 5.1 parity (50 µs workload buried under ~220 ms JVM startup)

Note that hyperfine on manifestJsonEx is dominated by JVM startup; JMH (which excludes startup) is the trustworthy signal there and shows ~30%.

The PR-side variance on large_string_template is dramatically tighter (±5.6 ms vs master ±79.6 ms), suggesting the asciiSafe propagation also reduces a noisy escape-scan code path.

References

Test plan

  • ./mill 'sjsonnet.jvm[3.3.7]'.test — all suites pass
  • ./mill 'sjsonnet.native[3.3.7]'.compile — passes
  • ./mill 'sjsonnet.js[3.3.7]'.compile — passes
  • ./mill __.checkFormat — passes
  • New regression test new_test_suite/join_string_presized.jsonnet
  • JMH bench (3 forks × 5 iters) on master + PR
  • hyperfine 30-run cross-validation on master + PR

Motivation:
The string-separator branch of std.join was building the result with
an unsized java.lang.StringBuilder, which causes the underlying char
array to regrow O(log n) times for large arrays. Re-evaluating each
arr.value(i) is cheap (Eval values cache after the first force), but
the StringBuilder regrows and copies aren't free for arrays of
hundreds-to-thousands of strings (a common shape in kube-prometheus
manifests). Independently, the resulting Val.Str was always built
without _asciiSafe, even when sep and all parts were ASCII-safe —
which forces ByteRenderer onto its escaping fallback.

Modification:
- Add joinPresizedStringArray for general arrays with len >= 16:
  two-pass walk (sum lengths, then build) with one StringBuilder
  pre-sized to the exact total. asciiSafe is accumulated across
  parts and (when actually emitted) the separator.
- Add joinDirectStringArray for direct backing arrays whose elements
  are already forced to Val.Str / Val.Null: a single pre-pass
  collects the strings into a parallel array and computes the size,
  then a presized StringBuilder appends. Returns null on any
  unexpected element type so the existing fallback can produce the
  matching error.
- Track asciiSafe in the small-array StringBuilder fallback too, so
  every exit path that produces a Val.Str gets the flag set when
  applicable. Total length is checked against Int.MaxValue to fail
  fast instead of overflowing.
- Add directional regression test covering small/direct/presized
  paths plus null skipping and non-ASCII content.

Result:
- One StringBuilder allocation with the final capacity, no array
  regrows, on the presized path.
- ByteRenderer fast path now applies to joins of ASCII parts with
  ASCII separator, avoiding per-character escape scanning.
- Full JVM test suite green; Scala 3 format check green.
@He-Pin He-Pin marked this pull request as draft May 15, 2026 23:11
@He-Pin He-Pin marked this pull request as ready for review May 16, 2026 03:28
@He-Pin He-Pin marked this pull request as draft May 16, 2026 06:54
@He-Pin He-Pin marked this pull request as ready for review May 16, 2026 10:08
stephenamar-db pushed a commit that referenced this pull request May 18, 2026
## Motivation

`Format.format` (the engine behind `%`-interpolation, `std.format`, and
the `mod` operator's string fallback) always returned a `Val.Str`
constructed via the default `Val.Str(pos, s)` factory, which leaves
`_asciiSafe = false`. This forced `ByteRenderer` onto the slow per-char
escape-scan + UTF-8 encode path when format outputs flowed into JSON
rendering — even when both the format string literals and every
interpolated value were pure ASCII.

Manifest workloads heavy on `%(name)s`-style templates
(Helm/Kubernetes-flavored configs) emit many such ASCII-safe strings
that go on to be rendered as JSON, so the cost compounds. This is the
second of two main sources of "ASCII-safe-but-flagged-unsafe" strings in
those workloads (the first, `std.join`, is companion PR #858).

## Modification

`sjsonnet/src/sjsonnet/Format.scala`:

- **`RuntimeFormat.literalsAsciiSafe`** — new field, computed once at
parse time by scanning the leading literal + every inter-spec literal
segment via `Platform.isAsciiJsonSafe`. Cached alongside the parsed
format, so each format string pays the literal-scan cost exactly once
and amortizes across every use of that cached `RuntimeFormat`.
- **Per-spec ASCII-safety check** at format time, two helpers:
- `simpleStringValueAsciiSafe(rawVal)` for `%(name)s`
simple-named-string paths.
- `specOutputAsciiSafe(rawVal, conversion)` for the general path:
strings forward `_asciiSafe`; numerics/booleans/null are ASCII (numerics
under `%c` depend on the codepoint range); `Val.Arr` / `Val.Obj`
(rendered via `Renderer`) are conservatively treated as non-ASCII.
- **`Format.format` returns `Val.Str`** — both the string-input and
pre-parsed-chunks overloads, plus `formatSimpleNamedString`. The
`_asciiSafe` flag is set at construction via `Val.Str.asciiSafe(pos, s)`
when literals + all spec outputs are ASCII-safe; otherwise the regular
`Val.Str(pos, s)` constructor is used.
- **Callers updated** to drop the redundant `Val.Str(pos, ...)` wrapper:
  - `Evaluator`: the `%` binary operator
  - `MathModule`: `std.mod` string fallback
  - `StringModule`: `std.format`
  - `Format.PartialApplyFmt`: static-folded format closure


`sjsonnet/test/resources/new_test_suite/format_asciisafe_propagation.jsonnet`
— regression test covering simple `%(name)s` fast path, general
`%s`/`%d`/`%x`/`%o`/`%c`/`%.2f` conversions, mixed ASCII literals +
non-ASCII string values, and a `std.manifestJson` roundtrip exercising
the ByteRenderer fast-path.

Format-time overhead is two boolean ANDs per spec; literal scanning
happens once at parse time.

## Result

Benchmarked on Apple Silicon, Zulu JDK 21.0.10, `-Xmx4G -XX:+UseG1GC
-Xss100m`, 3 forks × (3 warmup + 5 measurement) iterations.

**JMH `bench.runRegressions`** (averaged over 3 forks, ms/op, lower is
better):

| Benchmark | master | #860 | Δ |
|---|---:|---:|---:|
| `cpp_suite/large_string_template` | 0.724 ± 0.038 | 0.777 ± 0.229 |
(CIs overlap; cleanest fork: 0.695 → 0.683, **−1.7%**) |
| `jdk17_suite/repeat_format` | 0.155 ± 0.032 | 0.138 ± 0.016 |
**−11.0%** |
| `go_suite/manifestJsonEx` | 0.074 ± 0.042 | 0.052 ± 0.001 | **−29.7%**
|

JMH `large_string_template` mean is dominated by thermal/GC outliers on
Apple Silicon (note Fork 2's last two iterations spiked to 0.857 / 1.481
ms while Forks 1 & 3 ran cleanly around 0.683 ms). The per-fork minimums
and the cleanest fork consistently show the PR ahead. Confirmed via
hyperfine.

**hyperfine** (30 runs, 5 warmup, full-binary including JVM startup, ms,
lower is better):

| Benchmark | master | #860 | Speedup |
|---|---:|---:|---:|
| `large_string_template` | 278.6 ± 79.6 | 229.5 ± 2.6 | **1.21× ±
0.35** |
| `repeat_format` | 594.9 ± 66.7 | 580.9 ± 16.3 | **1.02× ± 0.12** |
| `manifestJsonEx` | 222.7 ± 3.1 | 223.8 ± 2.2 | parity (50 µs workload
buried under ~220 ms JVM startup) |

Hyperfine on `manifestJsonEx` is dominated by JVM startup; JMH (which
excludes startup) is the trustworthy signal there and shows ~30%.

PR-side variance on `large_string_template` is dramatically tighter
(±2.6 ms vs master ±79.6 ms), consistent with eliminating a noisy
escape-scan path.

## References

- Companion PR: #858 (`std.join` presize + asciiSafe propagation — same
idea, applied to join outputs)
- Bench evidence: `/tmp/bench-mmrr/master.log`,
`/tmp/bench-mmrr/pr860.log`, `/tmp/bench-mmrr/hyperfine-*.md` (local
artifacts)

## Test plan

- [x] New regression test
`new_test_suite/format_asciisafe_propagation.jsonnet` covers:
  - Simple `%(name)s` fast path with ASCII / non-ASCII literals + values
  - General `%s` / `%d` / `%x` / `%o` / `%c` / `%.2f` conversions
  - Mixed ASCII literals + non-ASCII string values
  - `std.manifestJson` roundtrip
- [x] `./mill 'sjsonnet.jvm[3.3.7]'.test` — 46 suites pass
- [x] `./mill 'sjsonnet.native[3.3.7]'.compile` — passes
- [x] `./mill 'sjsonnet.js[3.3.7]'.compile` — passes
- [x] `./mill __.checkFormat` — passes
- [x] JMH bench (3 forks × 5 iters) on master + PR
- [x] hyperfine 30-run cross-validation on master + PR
@stephenamar-db
Copy link
Copy Markdown
Collaborator

Minor: in both joinPresizedStringArray and joinDirectStringArray, the all-null short-circuit returns Val.Str(pos, "") — i.e. not ASCII-safe — while the updated inline fallback below returns Val.Str.asciiSafe(pos, "") (since asciiSafe stays true when nothing is appended).

Empty strings are trivially ASCII-safe, and keeping the ByteRenderer on the fast path for "" results is exactly the point of the asciiSafe propagation in this PR. Suggest:

if (!added) return Val.Str.asciiSafe(pos, "")

in both helpers, for consistency with the fallback path.

@stephenamar-db stephenamar-db self-requested a review May 18, 2026 21:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants