Skip to content

perf: propagate ASCII-safety through Format outputs#860

Merged
stephenamar-db merged 1 commit into
databricks:masterfrom
He-Pin:perf/format-asciisafe-propagation
May 18, 2026
Merged

perf: propagate ASCII-safety through Format outputs#860
stephenamar-db merged 1 commit into
databricks:masterfrom
He-Pin:perf/format-asciisafe-propagation

Conversation

@He-Pin
Copy link
Copy Markdown
Contributor

@He-Pin He-Pin commented May 15, 2026

Motivation

Format.format (the engine behind %-interpolation, std.format, and the mod operator's string fallback) always returned a Val.Str constructed via the default Val.Str(pos, s) factory, which leaves _asciiSafe = false. This forced ByteRenderer onto the slow per-char escape-scan + UTF-8 encode path when format outputs flowed into JSON rendering — even when both the format string literals and every interpolated value were pure ASCII.

Manifest workloads heavy on %(name)s-style templates (Helm/Kubernetes-flavored configs) emit many such ASCII-safe strings that go on to be rendered as JSON, so the cost compounds. This is the second of two main sources of "ASCII-safe-but-flagged-unsafe" strings in those workloads (the first, std.join, is companion PR #858).

Modification

sjsonnet/src/sjsonnet/Format.scala:

  • RuntimeFormat.literalsAsciiSafe — new field, computed once at parse time by scanning the leading literal + every inter-spec literal segment via Platform.isAsciiJsonSafe. Cached alongside the parsed format, so each format string pays the literal-scan cost exactly once and amortizes across every use of that cached RuntimeFormat.
  • Per-spec ASCII-safety check at format time, two helpers:
    • simpleStringValueAsciiSafe(rawVal) for %(name)s simple-named-string paths.
    • specOutputAsciiSafe(rawVal, conversion) for the general path: strings forward _asciiSafe; numerics/booleans/null are ASCII (numerics under %c depend on the codepoint range); Val.Arr / Val.Obj (rendered via Renderer) are conservatively treated as non-ASCII.
  • Format.format returns Val.Str — both the string-input and pre-parsed-chunks overloads, plus formatSimpleNamedString. The _asciiSafe flag is set at construction via Val.Str.asciiSafe(pos, s) when literals + all spec outputs are ASCII-safe; otherwise the regular Val.Str(pos, s) constructor is used.
  • Callers updated to drop the redundant Val.Str(pos, ...) wrapper:
    • Evaluator: the % binary operator
    • MathModule: std.mod string fallback
    • StringModule: std.format
    • Format.PartialApplyFmt: static-folded format closure

sjsonnet/test/resources/new_test_suite/format_asciisafe_propagation.jsonnet — regression test covering simple %(name)s fast path, general %s/%d/%x/%o/%c/%.2f conversions, mixed ASCII literals + non-ASCII string values, and a std.manifestJson roundtrip exercising the ByteRenderer fast-path.

Format-time overhead is two boolean ANDs per spec; literal scanning happens once at parse time.

Result

Benchmarked on Apple Silicon, Zulu JDK 21.0.10, -Xmx4G -XX:+UseG1GC -Xss100m, 3 forks × (3 warmup + 5 measurement) iterations.

JMH bench.runRegressions (averaged over 3 forks, ms/op, lower is better):

Benchmark master #860 Δ
cpp_suite/large_string_template 0.724 ± 0.038 0.777 ± 0.229 (CIs overlap; cleanest fork: 0.695 → 0.683, −1.7%)
jdk17_suite/repeat_format 0.155 ± 0.032 0.138 ± 0.016 −11.0%
go_suite/manifestJsonEx 0.074 ± 0.042 0.052 ± 0.001 −29.7%

JMH large_string_template mean is dominated by thermal/GC outliers on Apple Silicon (note Fork 2's last two iterations spiked to 0.857 / 1.481 ms while Forks 1 & 3 ran cleanly around 0.683 ms). The per-fork minimums and the cleanest fork consistently show the PR ahead. Confirmed via hyperfine.

hyperfine (30 runs, 5 warmup, full-binary including JVM startup, ms, lower is better):

Benchmark master #860 Speedup
large_string_template 278.6 ± 79.6 229.5 ± 2.6 1.21× ± 0.35
repeat_format 594.9 ± 66.7 580.9 ± 16.3 1.02× ± 0.12
manifestJsonEx 222.7 ± 3.1 223.8 ± 2.2 parity (50 µs workload buried under ~220 ms JVM startup)

Hyperfine on manifestJsonEx is dominated by JVM startup; JMH (which excludes startup) is the trustworthy signal there and shows ~30%.

PR-side variance on large_string_template is dramatically tighter (±2.6 ms vs master ±79.6 ms), consistent with eliminating a noisy escape-scan path.

References

Test plan

  • New regression test new_test_suite/format_asciisafe_propagation.jsonnet covers:
    • Simple %(name)s fast path with ASCII / non-ASCII literals + values
    • General %s / %d / %x / %o / %c / %.2f conversions
    • Mixed ASCII literals + non-ASCII string values
    • std.manifestJson roundtrip
  • ./mill 'sjsonnet.jvm[3.3.7]'.test — 46 suites pass
  • ./mill 'sjsonnet.native[3.3.7]'.compile — passes
  • ./mill 'sjsonnet.js[3.3.7]'.compile — passes
  • ./mill __.checkFormat — passes
  • JMH bench (3 forks × 5 iters) on master + PR
  • hyperfine 30-run cross-validation on master + PR

Motivation:
After PR databricks#858 added the join-presized + asciiSafe optimization, format
outputs (`%`-interpolation, `std.format`) still always created Val.Str
with `_asciiSafe = false`. Downstream JSON rendering of format results
falls back to the per-char escape scan + UTF-8 encode path even when
both the format string and all interpolated values are pure ASCII.
Manifest workloads heavy on `%(name)s` interpolation pay this cost on
every emitted string.

Modification:
- Add `literalsAsciiSafe` to RuntimeFormat, computed once at parse time
  by scanning leading + inter-spec literal segments for printable ASCII
  with no `"` or `\`.
- At format time, AND `literalsAsciiSafe` with each interpolated value's
  ASCII-safety: strings forward `_asciiSafe`; numerics are ASCII (except
  `%c` which depends on codepoint); booleans/null are ASCII; complex
  types (Arr/Obj routed through Renderer) are conservatively non-ASCII.
- Refactor `Format.format` (both overloads) and `formatSimpleNamedString`
  to return `Val.Str` directly so the `_asciiSafe` flag is set at
  construction. Update the three external callers (Evaluator binop `%`,
  std.mod, std.format) and `PartialApplyFmt.evalRhs` accordingly.

Result:
Format outputs now correctly carry `_asciiSafe = true` when all inputs
are ASCII-safe, letting ByteRenderer take the fast path during JSON
manifestation. Regression test
`new_test_suite/format_asciisafe_propagation.jsonnet` covers the simple
`%(name)s` fast path, general `%s`/`%d`/`%c`/`%x`/`%o`/`%f` conversions,
mixed ASCII/non-ASCII literals and values, and ByteRenderer roundtrip
via `std.manifestJson`.
@He-Pin He-Pin marked this pull request as draft May 15, 2026 23:11
@He-Pin He-Pin marked this pull request as ready for review May 16, 2026 03:28
@He-Pin He-Pin marked this pull request as draft May 16, 2026 06:54
@He-Pin He-Pin marked this pull request as ready for review May 16, 2026 10:08
@stephenamar-db stephenamar-db merged commit 90a15f0 into databricks:master May 18, 2026
5 checks passed
He-Pin added a commit to He-Pin/sjsonnet that referenced this pull request May 19, 2026
Motivation:
PR databricks#860 propagates ASCII-safety through Format outputs by tracking the
ASCII-safety of compiled format literals and each interpolated value,
then setting Val.Str._asciiSafe on the result. Add direct unit coverage
for PartialApplyFmt so future refactors of format-string parsing or
ASCII-safety detection cannot regress the optimization without a
visible test failure.

Modification:
* New FormatTests.scala exercising PartialApplyFmt.evalRhs through a
  minimal in-memory EvalScope. Four cases: safe numeric value, unsafe
  string value, unsafe static literal in the format, and ASCII-safety
  combined across multiple keys (one safe + one unsafe).

Result:
* ./mill 'sjsonnet.jvm[3.3.7]'.test.testOnly sjsonnet.FormatTests
  -- 4/4 PASS.
* ./mill __.checkFormat green.

References:
Follow-up unit coverage for databricks#860
stephenamar-db pushed a commit that referenced this pull request May 20, 2026
## Summary

Follow-up unit coverage for #860 (which propagates ASCII-safety through
Format outputs).

The original branch held the production implementation, but #860 covers
the same optimization with a slightly different approach and was merged
first. Reduce this PR to the regression tests that were developed for
the original implementation — they pass unchanged against the current
`Format.scala`.

## Changes

* `sjsonnet/test/src/sjsonnet/FormatTests.scala` — 4 new tests directly
exercising `Format.PartialApplyFmt.evalRhs` via a minimal in-memory
`EvalScope`:
  1. simple named format preserves ascii-safe result for numeric values
  2. simple named format does not mark unsafe string values ascii-safe
  3. simple named format does not mark unsafe static literals ascii-safe
  4. simple named format combines ascii-safety across multiple keys

## Verification

```bash
./mill 'sjsonnet.jvm[3.3.7]'.test.testOnly sjsonnet.FormatTests
# Tests: 4, Passed: 4, Failed: 0

./mill __.checkFormat
# SUCCESS
```

## References

* #860 — perf: propagate ASCII-safety through Format outputs (merged)
stephenamar-db pushed a commit that referenced this pull request May 20, 2026
## Motivation

`std.join` produced output strings via `Iterator.foldLeft + concat`,
which:
1. Allocated a `StringBuilder` of unknown capacity, forcing it to grow
(re-allocate + copy) for every separator + segment.
2. Set `Val.Str._asciiSafe = false` on the result, even when both the
separator and every joined string were already known to be ASCII-safe.
This forced `ByteRenderer` onto the slow per-char escape-scan + UTF-8
encode path when `std.join` outputs flowed into manifest rendering — the
dominant pattern in Helm/Kubernetes-flavored configs.

Manifest workloads emit many `std.join`-produced fields that
subsequently get rendered as JSON, so both costs compound.

## Modification

`sjsonnet/src/sjsonnet/stdlib/StringModule.scala` — `Join.evalRhs` and
helpers (`joinPresizedStringArray`, `joinDirectStringArray`,
`joinRepeatedStringEval`):

- **Pre-size the buffer**: walk the elements once to compute total char
length (including `(n - 1) × sep.length`), allocate `StringBuilder` with
that exact capacity. No grow, no copy.
- **Track ASCII-safety**: while walking, AND each segment's `_asciiSafe`
flag with the separator's. If all are ASCII-safe, construct the result
via `Val.Str.asciiSafe(pos, s)`; otherwise via the regular `Val.Str(pos,
s)` constructor.
- **Empty-string returns are ASCII-safe**: every `len == 0` / `!added` /
`Val.Null` short-circuit returns `Val.Str.asciiSafe(pos, "")` so empty
results stay on the ByteRenderer fast path. (Per @stephenamar-db's
review.)
- **Type-checking unchanged**: same `Val.Str` / `Val.Null` /
element-type validation, same error messages.

`sjsonnet/test/resources/new_test_suite/join_string_presized.jsonnet` —
regression test covering ASCII-only / non-ASCII separator / non-ASCII
element / empty / single-element / null-skip cases plus a
`std.manifestJson` roundtrip exercising the ByteRenderer fast-path.

## Result

Benchmarked on Apple Silicon, Zulu JDK 21.0.10. Native binary built via
`./mill 'sjsonnet.native[3.3.7]'.nativeLink` (release-full, full LTO).

**Hyperfine (Scala Native binary, end-to-end wall time, lower is
better)**:

| Workload | master | #858 | Speedup |
|---|---:|---:|---:|
| kube-prometheus (`example.jsonnet`, 72 414-line manifest) | 179.2 ±
29.5 ms | 168.2 ± 10.9 ms | **1.07× ± 0.19** |
| `cpp_suite/realistic2.jsonnet` | 108.2 ± 15.9 ms | 102.9 ± 9.5 ms |
**1.05× ± 0.18** |
| `cpp_suite/large_string_template.jsonnet` | 11.3 ± 0.3 ms | 11.8 ± 1.1
ms | parity (workload too small for end-to-end signal) |
| `cpp_suite/large_string_join.jsonnet` | 5.9 ± 0.6 ms | 6.8 ± 1.4 ms |
parity (workload too small for end-to-end signal) |

Real-world manifest workloads (kube-prometheus, realistic2) show a
consistent ~5-7% wall-time win. Sub-second cpp_suite workloads have
process-startup overhead dominating their variance, so any `std.join`
delta is buried — but the JMH steady-state numbers below confirm the
optimization works.

**Variance reduction**: master shows 2-3× wider σ than #858 on the
larger workloads (kube-prometheus master ±29.5 ms vs #858 ±10.9 ms;
realistic2 master ±15.9 ms vs #858 ±9.5 ms), consistent with eliminating
a noisy escape-scan code path on the renderer side.

**JMH (JVM steady-state, `bench.runRegressions`, 2 forks × 3 warmup × 5
measurement iterations, ms/op)**:

| Benchmark | master | #858 | Δ |
|---|---:|---:|---:|
| `cpp_suite/large_string_template` | 0.784 ± 0.165 | 0.937 ± 0.222 |
within noise (CIs overlap) |
| `cpp_suite/large_string_join` | 0.345 ± 0.071 | 0.327 ± 0.066 |
**−5%** |
| `jdk17_suite/repeat_format` | 0.164 ± 0.079 | 0.196 ± 0.111 | within
noise |
| `go_suite/manifestJsonEx` | 0.092 ± 0.103 | 0.079 ± 0.053 | **−14%** |

JMH numbers on these sub-millisecond workloads have high variance on
Apple Silicon (CIs frequently span ±20-100%); the directional signal
(manifest-heavy workloads improve, neutral-to-slightly-faster on
string-only workloads) is consistent with the hyperfine native data.

**Output identity**: `diff` of `kube-prometheus` 72 414-line render
output between master and #858 binaries returns 0 byte differences.

## References

- Companion PR: #860 (Format asciiSafe propagation — same idea, applied
to `%`/`std.format` outputs; merged)
- Reviewer feedback addressed: @stephenamar-db's comment on empty-string
asciiSafe consistency (now applied to all five `Val.Str(pos, "")` paths
in the join helpers)

## Test plan

- [x] `./mill 'sjsonnet.jvm[3.3.7]'.test` — all suites pass
- [x] `./mill 'sjsonnet.native[3.3.7]'.compile` — passes
- [x] `./mill 'sjsonnet.js[3.3.7]'.compile` — passes
- [x] `./mill __.checkFormat` — passes
- [x] New regression test `new_test_suite/join_string_presized.jsonnet`
- [x] Output-identity check on kube-prometheus (72 414-line render — 0
byte diff vs master)
- [x] Hyperfine (Scala Native) cross-validation on master + #858
- [x] JMH (JVM steady-state) cross-validation on master + #858
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants