compiler: deduplicate catalog literal slices#8
Merged
waywardmonkeys merged 1 commit intomainfrom Apr 29, 2026
Merged
Conversation
9873a48 to
24597b6
Compare
Add a compiler-side literal pool for the catalog LITS chunk so repeated non-empty text and literal-expression fragments can share one emitted byte range. The bytecode format stays unchanged: OP_OUT_SLICE and OP_OUT_EXPR still carry (offset, len) operands, but duplicate fragments now reuse the first offset when deduplication is enabled. Expose the behavior through CompileOptions::literal_deduplication with three modes: - Enabled: deduplicate repeated literal fragments and emit smaller LITS bytes. - Disabled: preserve the previous append-only layout and skip duplicate tracking work. - MeasureOnly: preserve the append-only layout while collecting duplicate opportunity stats. Add CompiledCatalog::literal_stats so callers can evaluate the optimization against real catalog data. The CLI now accepts --literal-stats, --no-lits-dedup, and --measure-lits-dedup for resource-backed catalog compiles. Store literal pool lookup entries as hashes plus offsets into the emitted literal byte blob rather than owning a second copy of every unique literal as a map key. Cover enabled, disabled, and measure-only lowering behavior with regression tests, including literal-expression slices.
24597b6 to
86a519d
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add a compiler-side literal pool for the catalog
LITSchunk so repeated non-empty text and literal-expression fragments can share one emitted byte range. The bytecode format stays unchanged:OP_OUT_SLICEandOP_OUT_EXPRstill carry(offset, len)operands, but duplicate fragments now reuse the first offset when deduplication is enabled.Expose the behavior through
CompileOptions::literal_deduplicationwith three modes:Enabled: deduplicate repeated literal fragments and emit smallerLITSbytes.Disabled: preserve the previous append-only layout and skip duplicate tracking work.MeasureOnly: preserve the append-only layout while collecting duplicate opportunity stats.Add
CompiledCatalog::literal_statsso callers can evaluate the optimization against real catalog data. The CLI now accepts--literal-stats,--no-lits-dedup, and--measure-lits-dedupfor resource-backed catalog compiles.Store literal pool lookup entries as hashes plus offsets into the emitted literal byte blob rather than owning a second copy of every unique literal as a map key.