Skip to content

compiler: deduplicate catalog literal slices#8

Merged
waywardmonkeys merged 1 commit intomainfrom
dedupe-lits
Apr 29, 2026
Merged

compiler: deduplicate catalog literal slices#8
waywardmonkeys merged 1 commit intomainfrom
dedupe-lits

Conversation

@waywardmonkeys
Copy link
Copy Markdown
Contributor

@waywardmonkeys waywardmonkeys commented Apr 29, 2026

Add a compiler-side literal pool for the catalog LITS chunk so repeated non-empty text and literal-expression fragments can share one emitted byte range. The bytecode format stays unchanged: OP_OUT_SLICE and OP_OUT_EXPR still carry (offset, len) operands, but duplicate fragments now reuse the first offset when deduplication is enabled.

Expose the behavior through CompileOptions::literal_deduplication with three modes:

  • Enabled: deduplicate repeated literal fragments and emit smaller LITS bytes.
  • Disabled: preserve the previous append-only layout and skip duplicate tracking work.
  • MeasureOnly: preserve the append-only layout while collecting duplicate opportunity stats.

Add CompiledCatalog::literal_stats so callers can evaluate the optimization against real catalog data. The CLI now accepts --literal-stats, --no-lits-dedup, and --measure-lits-dedup for resource-backed catalog compiles.

Store literal pool lookup entries as hashes plus offsets into the emitted literal byte blob rather than owning a second copy of every unique literal as a map key.

@waywardmonkeys waywardmonkeys force-pushed the dedupe-lits branch 2 times, most recently from 9873a48 to 24597b6 Compare April 29, 2026 10:37
Add a compiler-side literal pool for the catalog LITS chunk so repeated non-empty text and literal-expression fragments can share one emitted byte range. The bytecode format stays unchanged: OP_OUT_SLICE and OP_OUT_EXPR still carry (offset, len) operands, but duplicate fragments now reuse the first offset when deduplication is enabled.

Expose the behavior through CompileOptions::literal_deduplication with three modes:

- Enabled: deduplicate repeated literal fragments and emit smaller LITS bytes.

- Disabled: preserve the previous append-only layout and skip duplicate tracking work.

- MeasureOnly: preserve the append-only layout while collecting duplicate opportunity stats.

Add CompiledCatalog::literal_stats so callers can evaluate the optimization against real catalog data. The CLI now accepts --literal-stats, --no-lits-dedup, and --measure-lits-dedup for resource-backed catalog compiles.

Store literal pool lookup entries as hashes plus offsets into the emitted literal byte blob rather than owning a second copy of every unique literal as a map key.

Cover enabled, disabled, and measure-only lowering behavior with regression tests, including literal-expression slices.
@waywardmonkeys waywardmonkeys merged commit 574a945 into main Apr 29, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant