All-LogUp auxiliary trace bus constraints by adr1anh · Pull Request #2962 · 0xMiden/miden-vm

adr1anh · 2026-04-05T10:52:58Z

All-LogUp Auxiliary Trace

What this PR does

Every auxiliary-trace bus in the Miden VM used to be described by a bespoke constraint: some were running-product (multiset) checks written out by hand, others were partial LogUp arguments. Each lived in its own per-component file, and the aux-trace generation code was spread across processor/src/trace/{decoder,range,stack,chiplets}/aux_trace*.rs, duplicating the algebra row-by-row in imperative Rust.

This PR replaces all of that with a single unified LogUp formulation, described declaratively. Every bus — the block hash table, op-group table, stack overflow, range check, chiplet request bus, sibling table, hasher perm-link, ACE wiring, kernel ROM, and the rest — is now a branch of one descriptive AIR (MidenLookupAir) expressed against a small closure-based builder trait. The same description is consumed by the verifier's constraint evaluator, by the prover's aux-trace generator, by the recursive-verifier ACE circuit, and by two debug walkers.

The practical consequences:

The final layout is 4 extension columns on the main trace and 3 on the chiplet trace (seven total). The 4+3 split is deliberate: in a future multi-trace setup, the main-trace column count lands under the next power-of-two boundary without padding.
Only one committed boundary value per trace (two total, with a forced-zero placeholder in the second slot until trace-splitting lands — see NUM_LOGUP_COMMITTED_FINALS). Seven column accumulators collapse into one partial-sum column per trace.
The kernel ROM chiplet is now purely lookup-based. Previously it duplicated a row per callsite; now one row per digest carries a multiplicity. For realistic kernels this deletes thousands of chiplet rows.
The per-component aux-trace Rust code is gone, not dual-maintained. Aux-trace generation now walks the same LookupAir::eval description the constraint side does.

How the code is laid out

There are two halves, and it's worth keeping them separate in your head while reading.

The generic LogUp framework lives in air/src/lookup/. It is deliberately free of any Miden-specific types — the intent is to eventually extract it into its own crate. You'll find the closure-based builder stack (LookupBuilder → LookupColumn → LookupGroup → LookupBatch), a LookupAir trait modelled on p3_air::Air, and three concrete walkers of a LookupAir description:

ConstraintLookupBuilder — symbolic evaluation for the verifier and the prover's quotient pass.
ProverLookupBuilder — concrete-row fraction collection, followed by batch inversion and a fused row-major chunk-parallel accumulator.
Two debug walkers under debug/: one that validates the shape of a LookupAir (column count, declared-vs-observed degrees per interaction, canonical/encoded equivalence for cached-encoding groups), and one that balances a real concrete trace, producing a human-readable report of unmatched adds/removes. The second is not yet wired into the prover/processor paths; that's one of the follow-ups.

The pattern is essentially Plonky3's AirBuilder pattern: a LookupAir describes itself once, against an interface, and each concrete builder interprets that description differently. It's also essentially a serializer pattern — the shape is written once, many things read it.

The Miden-specific wiring lives in air/src/constraints/lookup/. Here you'll find one file per extension column, under buses/. This is the place to read if you want to know "what bus is this row pushing into, and how does the interaction encode?" Start with miden_air.rs (the thin aggregator), then main_air.rs and chiplet_air.rs for the two halves, then messages.rs for the full catalogue of typed message structs (25 BusId variants). Every bus message is a concrete struct with its own encoding; each bus has a dedicated domain (BusId::*) so multiple linearly-independent buses can sit on one column via distinct bus_prefix prefixes.

The key conceptual unit inside a column is a group. Groups within a column are additively independent — their degrees add. Within a group, the branches are mutually exclusive: only one fires per row, so the group's degree is the max across branches, not the sum. A branch may be a single interaction or a batch of simultaneous interactions sharing one flag. Every add/remove/batch/insert call is annotated with its numerator and denominator degree, and those annotations are cross-checked by the validator against the symbolic degree actually accumulated. That annotation is what made the current packing (everything inside degree 9) tractable; the same metadata could drive an automated packer later.

Messages and domain separation

We used to derive bus labels by reverse-bitting operation codes. That's gone. Each distinct bus interaction type is now a named BusId variant, cast directly into the bus_prefix table. We've also split several messages that were previously sharing an id — for example, the hasher's various input/output paths now each carry their own variant rather than sharing a generic "hasher message". messages.rs is the centralised place to review domain separation; some of it was Claude-generated and there are still obvious simplifications we'd like to fold in.

Outer / boundary interactions

Public-input-driven terms (program hash seed into the block hash queue, the log-precompile transcript bookends, kernel ROM initialisation from the variable-length public inputs) are emitted through a separate eval_boundary hook on LookupAir that takes a BoundaryBuilder. The debug trace walker consumes this directly. The verifier's scalar check (ProcessorAir::reduced_aux_values) now also drives through eval_boundary via a BoundaryBuilder reducer — so this code path is unified. The MASM recursive verifier still has its own copy of these boundary formulas; unifying the third source is a follow-up.

ACE and MASM recursive verifier

The ACE circuit's LogUp boundary identity (0 = Σ aux_bound + c_block_hash + c_log_precompile + c_kernel_rom) is added in air/src/ace.rs. On the MASM side, aux_trace.masm now does a single Poseidon2 absorb of [COM, W0] (two boundary felts, second always zero), and public_inputs.masm has been reworked to sum the variable-length public inputs as LogUp fractions rather than running the old multiset reduction. The constraint evaluation MASM (constraints_eval.masm) is regenerated by a new regenerate-constraints binary in crates/lib/core/src/bin/.

Tests

The legacy per-component aux-trace unit tests (processor/src/trace/tests/{decoder,stack,range,hasher,chiplets/*}.rs) were deleted alongside their targets. In their place are two new files: lookup_harness.rs builds a column-blind, subset-based expectation DSL over (multiplicity, encoded_denominator) pairs, and lookup.rs is an end-to-end smoke test that drives a tiny real trace through the prover's collection pipeline. The existing tests were heavily rewritten through this harness, much of it Claude-assisted — that's one of the areas worth the most scrutiny. Restoring granular per-component coverage is a blocker before we flip off WIP.

Reviewing this PR

For an efficient walk, I'd suggest:

air/src/constraints/lookup/miden_air.rs and messages.rs — establish the final shape and the bus catalogue.
air/src/constraints/lookup/buses/*.rs — one file per column. Each is the authoritative description of one bus's semantics; read these instead of the per-component design docs (which haven't caught up).
air/src/lookup/builder.rs — the four builder traits. Everything else in air/src/lookup/ is an implementation of these against a specific target (constraint eval, prover, debug).
air/src/lookup/constraint.rs + prover.rs — the two production walkers.
ACE + MASM: air/src/ace.rs, crates/lib/core/asm/sys/vm/{aux_trace,public_inputs,constraints_eval}.masm.
Processor tests: processor/src/trace/tests/lookup{,_harness}.rs and the rewritten per-component files.

Deeper walkthrough — design rationale and nuances

Why a new description layer at all

The previous structure described each bus independently, and nothing enforced consistency between what the constraint code asserted and what the aux-trace builder computed. Different components used different conventions (multiset vs. LogUp), different column layouts, and different ad-hoc label schemes. After this PR, one description drives four concrete uses: the verifier's constraint evaluator, the prover's quotient pass, the prover's aux-trace generator, and the recursive ACE circuit. This unification is the point — not LogUp itself, which was only a means to get there.

Because the description is canonical, the current docs under docs/src/design/lookups/ and the per-chiplet bus pages no longer match the code. We have an issue open for a docs rewrite. The argument for not going through another round of exhaustive LaTeX is that the new code is much more self-describing: the per-bus file names, BusId enum, LookupMessage structs, and Deg annotations spell out the interaction inventory directly. Docs can lean on that and stop duplicating formulas.

How to read a bus file

Start at the top with MidenLookupAir. It defines the lookup argument both for the main trace and the chiplet trace. At the moment we have two sub-AIRs (MainLookupAir, ChipletLookupAir), each carrying its own declared shape: the number of columns, the max interactions per column (which hints the adapter how many fractions to buffer per row), and the bus-ID count required for deriving challenges and sizing the bus_prefix table.

Each sub-AIR also carries a lookup-specific context — the collection of flags that gate interactions. This is built once per eval call and shared between all the columns in that AIR. The flags are designed to express mutual exclusivity: if two flags in a group cannot be simultaneously active, the group's degree is the max of the branches rather than the sum. The context is very similar to the old OpFlags / ChipletFlags but has been re-implemented as its own type for two reasons. First, the constraint-path construction uses polynomial arithmetic (the default), but the prover-path adapter can override the constructor to use boolean arithmetic on the concrete 0/1 decoder bits, which is meaningfully cheaper. Second, it pins down the smaller surface the LogUp code actually needs, independently of the full OpFlags. The override is the hook for the follow-up where we want to hint to the aux-trace generator that all flags on a row are inactive so the whole row can be skipped — padding rows are free wins there.

Walking through a column: you start with a column, containing groups. Groups in the same column are independent — their degrees add. Within a group the branches are mutually exclusive; only one fires per row, and the group's degree is the max. A branch can be a single interaction or a batch of simultaneous interactions sharing the outer group's flag. Every interaction carries a name and a declared (n, d) degree pair, both of which are cross-checked by the validator. Naming and annotating is the reason we were able to find the current packing at all: we extracted the per-interaction degrees and ran a small optimisation pass to find a tighter column layout. That same metadata can drive a brute-force packer next time we need to add an interaction.

The builder pattern

The entry point for the framework is LookupAir — a structure that describes itself to an interface. That interface is the LookupBuilder stack (column → group → batch). This is the same pattern as Plonky3's AirBuilder, and morally the same as serialisation: a struct describes itself once, and different interpreters do different things with that description.

There are three interpretations:

ConstraintLookupBuilder (air/src/lookup/constraint.rs) wraps a LiftedAirBuilder. As it walks the description, it keeps a single rational-function accumulator per scope (column, group, or batch). When a scope closes, its accumulator folds into its parent — the walk is linear, no intermediate storage. Because the target is a generic AirBuilder, the resulting expression can be fed to the prover's quotient computation, the verifier's OOD evaluation, and the ACE circuit evaluation — three consumers of the same description without code duplication. The formulas for how a group composes into its parent are documented in the file.
ProverLookupBuilder (air/src/lookup/prover.rs) is the aux-trace path. It runs in three passes: first it walks the AIR to collect every (multiplicity, denominator) pair per column into a flat buffer (LookupFractions). Second it batch-inverts the denominators. Third it sums the inverses into per-column partial-sum accumulators. Only one auxiliary value needs to be sent per trace at the end — a trick enabled by the generic description, and the reason we went from 7+ committed finals down to 1 per trace (2 total, with the trailing zero for forward-compatibility). Only active branches (flag ≠ 0) get the closure evaluated, saving arithmetic on rows where the interaction isn't firing. The code hasn't been benchmarked carefully yet but it already parallelises; there's headroom. One follow-up on the to-do list is to change the aux-builder traits in our lifted-stark layer to take the LookupAir directly — a larger refactor that would eventually push these lookups purely onto the processor trait.
Debug walkers (air/src/lookup/debug/). There are two, for two different kinds of check:
- debug/trace/ balances a full concrete trace. It walks the AIR while summing all interactions, and emits a human-readable report of which rows have unmatched adds/removes. This is the replacement for the old bus-debugger infrastructure; the mechanism is in place but we haven't hooked it into the prover/processor paths yet. This also requires eval_boundary to be handled correctly, which it now is.
- debug/validation/ validates the shape of the description: number of columns, declared degrees match observed symbolic degrees, encoding equivalence for cached-encoding groups (more on this below), scope discipline. Single entry point: MidenLookupAir::validate(layout).

The cached-encoding escape hatch

The biggest tension in this design was between two performance axes: aux-trace generation efficiency and constraint-evaluation efficiency. The canonical description is optimised more heavily for the former — it reads cleanly and only fires closures on active flags. Constraint evaluation, however, must evaluate every branch symbolically, and the canonical description has a lot of repetition. The hasher responses are the worst offender: each variant uses a linear combination of (α + β·addr + β²·node_idx)-shaped prefixes that would ideally be memoised across branches.

To address this, a group can optionally carry a second, arithmetic-form description alongside its canonical one. This is the group_with_cached_encoding API. The constraint-path adapter consumes the encoded form where supplied, giving the expression tree a much better chance at common-subexpression elimination (notably in the ACE codegen, which does aggressive CSE). Part of the validation walker's job is to ensure both representations are equivalent — it walks both paths and verifies the resulting expressions match using the Schwarz–Zippel lemma. We've only applied this to the hasher chiplet so far, where the per-variant repetition is worst; it's a follow-up to sweep the rest.

The MASM and ACE side

The ACE circuit was modified to remove support for multiset reductions and to match the new aux-value layout. We build it with only as many boundary values as there are traces — so 1 for the current single-trace setup, but we allocate 2 (with the second forced zero) as a forward-looking hack, because the MASM recursive verifier already needs to absorb 2 boundary felts in its forthcoming trace-split layout. Updating MASM later to drop the slot is cheaper than updating it twice.

On the MASM side, the layout now expects two extension-field elements for the bus boundary. The larger change was to public_inputs.masm: variable-length public inputs now only need to be summed up as LogUp fractions rather than run through a multiset reduction. That collapses a chunk of code. As a follow-up, we want to reduce all the outer-boundary values in MASM, since they all have the same shape now — much more practical than it used to be.

Despite the constraint-evaluation builder introducing a lot of structural duplication (each branch evaluates its own encoding), the number of inputs and eval gates in the ACE circuit is very similar to before. The ACE codegen's CSE pass absorbs the redundancy.

The test rewrite

Almost every test under processor/src/trace/tests/ was rewritten. The reason is blunt: the aux-trace format changed, so the old assertions were meaningless. The rewrite tried to shift tests from asserting on the final trace shape to asserting on the semantic interactions — each test now declares "for this row, I expect these (multiplicity, message) pairs" and the harness checks that as a multiset subset of the actual prover emissions. Tests are more descriptive as a result. The harness is processor/src/trace/tests/lookup_harness.rs.

Most of the rewriting was Claude-assisted under time pressure, and that's the section of the PR most in need of close review. On the positive side, the harness is column-blind and subset-based, so these tests should survive any future column repack.

Prover changes are mostly nil

The prover itself barely changes. Almost all the logic moved into air/: the aux-trace generator is now just a thin wrapper (MidenLookupAuxBuilder) that calls into build_lookup_fractions + accumulate. The reduction comes from deleting the per-component aux-trace code in the processor — it's all delegated to a single point now.

Follow-ups

Landed since the original PR description:

Kernel ROM chiplet reworked onto multiplicity-based LogUp.
Deg annotation (name + degree) on every builder call.
Bus labels replaced by the BusId enum with direct prefix indexing.
Single-column partial-sum accumulator per trace (7 → 2 committed finals).
Prover-side op-flags fast path (MainLookupBuilder::build_op_flags override hook).

Outstanding, in rough priority order:

Revive per-component processor aux-trace tests. Today only the end-to-end oracle survives, which doesn't localise regressions.
Rewrite the bus design docs. docs/src/design/lookups/{multiset,logup}.md and per-chiplet pages still describe the old running-product world and now contradict the code.
Wire the real-trace bus debugger into the prover/processor paths. The infrastructure is there; today a silently-imbalanced bus just fails verification.
Unify boundary emission. Rust, ACE and MASM still carry three copies of the boundary formulas.
Padding-row early-exit hint for trace generation.
Cached-encoding optimisation for constraint eval. The hook is there; the widest use case (hasher responses) would save ~58% per design estimates.
OpFlagsTrace variant to avoid building all 128 flags polynomially on the trace-generation path.

FG's web intro

https://gisthost.github.io/?10dbb2d98054b6b78bc462e39ecd1929

Test plan

make test-fast followed by a targeted make test-air and make test-processor.
cargo test -p miden-air for the lookup validator + bus degree inventory.
End-to-end proof generation — pending the remaining trace-side integration.

Squashed snapshot of 24 commits introducing the new closure-based LogUp lookup-argument API in air/src/constraints/lookup/, plus its supporting infrastructure (logup, logup_msg, the LookupAir / LookupBuilder / LookupColumn / LookupGroup / LookupBatch trait stack, ConstraintLookupBuilder and ProverLookupBuilder adapters, LookupChallenges, MidenLookupAir) and the seven per-bus emitters in lookup/buses/. Squashed for clean rebase onto adr1anh/constraint-simplification — the intermediate commits referenced types (MainTraceRow, LiftedAirBuilder, the old trace::Challenges) that 2856 reshapes, so a per-commit replay is not viable. Granular history preserved on adr1anh/bus.pre-rebase-backup. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Post-rebase Phase 3 of the lookup-module adaptation: - Replace LookupChallenges struct with a type alias for trace::Challenges (the unified 2856 challenge type with fixed-size bus_prefix / beta_powers arrays). - Migrate LookupMessage trait to use trace::Challenges in place of the dropped LookupChallenges. ConstraintLookupBuilder / ProverLookupBuilder / DualBuilder drop the max_message_width / num_bus_ids constructor args (sizes are now fixed at compile time via NUM_BUS_TYPES / MAX_MESSAGE_WIDTH). - bus_prefix() trait method takes usize instead of u16; lookup::bus_id is now a thin re-export shim over trace::bus_types. - Rewrite lookup/buses/{block_stack,block_hash_and_op_group,chiplet_requests, range_logcap,wiring}.rs against the typed MainCols struct: replace MainTraceRow indexed access with DecoderCols / StackCols / RangeCols / SystemCols field accessors, switch OpFlags::new from the ExprDecoderAccess wrapper to the (decoder, stack, decoder_next) signature (which folds end_next/repeat_next/halt_next into a single struct), and recover the ACE chiplet flag with the new virtual s0 = 1 - s_ctrl - s_perm formula. - Add bus parameters to every Challenges::encode call in logup_msg.rs (each inherent Msg::encode now passes its bus type from trace::bus_types). - Stub lookup/buses/{chiplet_responses,hash_kernel}.rs with no-op emit_* functions: the original implementations referenced the pre-2856 hasher chiplet (32-row P_CYCLE_ROW_0/31, hasher::flags::f_mv etc.) and need to be rewritten against the new hasher_control + permutation split + 16-row HasherPeriodicCols. Originals are preserved on adr1anh/bus.pre-rebase-backup. - Delete dead constraints/logup.rs (the legacy RationalSet / Batch / Column algebra is no longer referenced; superseded by lookup module). - Disable constraints/degree_audit.rs tests with cfg(any()) until they're ported to MainCols / OpFlags::new(...) / Challenges::encode(bus, [...]). The two MidenLookupAir canary tests pass: the degree budget (with the two stubbed columns counted) and the cached-encoding equivalence check. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…56 hasher Replace the two stubbed bus emitters with full implementations against the 2856 controller/permutation hasher split: - hash_kernel.rs: sibling table (BUS_SIBLING_TABLE) now uses the s_ctrl-gated (s0*s1*s2 / s0*s1*!s2) MU/MV pair from 2856's running product instead of the old 32-row cycle-row filter; the four absorb- after MVA/MUA variants are gone. ACE memory reads (BUS_CHIPLETS) keep the same NUM_ACE_SELECTORS-relative offsets. SiblingMsgBitZero/BitOne pick up mrupdate_id at beta^1 for cross-MRUPDATE domain separation, matching the SIBLING_TABLE encoding. - chiplet_responses.rs: the 7 hasher response variants now mirror 2856's compute_hasher_response — sponge_start/sponge_respan replace the old f_bp/f_abp pair and use the new is_boundary trace column instead of cycle_row_0/cycle_row_31; mp/mv/mu/hout/sout keep their previous semantics under the new (s_ctrl, hs0..hs2, is_boundary) gating. The bitwise/memory/ACE/kernel-ROM responses use the runtime-muxed encoding through the existing *ResponseMsg types so the C1 transition stays at degree 8. Selector wiring across both files mirrors lookup/buses/wiring.rs: s_ctrl = chiplets[0], s_perm = perm_seg, virtual s0 = 1 - s_ctrl - s_perm, and s1..s4 = chiplets[1..5]. Bitwise k_transition is read via the typed PeriodicCols::bitwise.k_transition borrow. Tests: - miden_lookup_air_degree_within_budget passes (max deg 9, hash_kernel lands at 7, chiplet_responses at 8). - miden_lookup_air_cached_encoding_equivalence passes (no cached calls on the two ported columns; chiplet_requests still exercises the dual-path machinery). Also picks up an unrelated rustfmt sweep + dead-import cleanup from running `make lint`. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Collapse duplicated per-bus precompute into two shared contexts built once per `MidenLookupAir::eval`: - `MainTraceContext` holds a single `OpFlags` instance for the 4 main-trace buses (was built independently in each bus). - `ChipletTraceContext` holds a `ChipletActiveFlags` snapshot mirroring the active-flag block of `build_chiplet_selectors` for the 3 chiplet-trace buses (was recomputed manually in each). Per-bus cleanups: swap raw `chiplets[OFFSET + FIELD_IDX]` indexing for the typed `local.bitwise/memory/ace/kernel_rom/controller` borrow views; drop premature `_e: LB::Expr` named bindings in favor of `Var` captures; use `.not()` (via `BoolNot`) for boolean negation of known-boolean columns; delete the `CreqCtx` struct in `chiplet_requests.rs` and inline everything into a flat `col.group` body (matching `block_hash_and_op_group`'s pattern, dropping the cached-encoding optimization). No behavioral change: the degree-budget and cached-encoding-equivalence tests in `miden_air.rs` both pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…cleanups Split the monolithic MidenLookupAir into MainLookupAir (4 main-trace columns) and ChipletLookupAir (3 chiplet-trace columns) behind per-air extension traits (MainLookupBuilder / ChipletLookupBuilder). The existing MidenLookupAir is now a thin aggregator delegating to both, so the degree-budget and cached-encoding tests keep working unchanged while a future enum-dispatch wrapper can reach for either sub-air independently. Each sub-air owns a concrete context struct (MainBusContext / ChipletBusContext) built via a single builder hook (build_op_flags / build_chiplet_active), so the prover adapter can later override the polynomial construction with a boolean fast path without touching any emitter. ConstraintLookupBuilder, ProverLookupBuilder (gated on F = Felt), and DualBuilder each pick up empty impls of both extension traits. Bus emitter cleanups: - Hoist every message-producing computation out of the struct literal into closure-local let bindings; each emit_* closure now ends with a flat field-punned constructor. - Replace Into::<LB::Expr>::into(x) turbofishes with x.into() (via typed bindings where arithmetic would otherwise be ambiguous). - Replace .map(Into::into) / .map(|v| v.into()) with .map(LB::Expr::from) for concrete From-function-pointer conversion. - Drop _e / _var suffixes via shadowing; drop redundant : LB::Expr annotations wherever the use site already pins the type. - Introduce h_0 / h_1 bindings for the decoder's [Var; 8] rate halves and rate_0 / rate_1 / cap bindings for the chiplet controller's [Var; 12] sponge state ([RATE0, RATE1, CAP] convention). - Replace the 7-bit op_bits fold in block_hash_and_op_group with horner_eval_bits. - Move payload computation *inside* message closures so the prover path skips dead work on flag-zero rows. Drop the scattered #[expect(dead_code)] / #[expect(unused_imports)] attributes across the lookup tree in favor of a single module-level #![allow(dead_code, unused_imports)] in lookup/mod.rs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Removes the entire Task #9 cleanup block that the file has been flagging for deletion since the LookupMessage-trait rewrite (Task #12 / Amendment B). All consumers in lookup/buses/*.rs reach for the structs directly and encode via the LookupMessage trait from lookup/message.rs; the legacy LogUpMessage trait plus its blanket impls had no live callers. Deleted items (all flagged dead_code by clippy, all verified via grep to have zero external references): - `pub trait LogUpMessage` and all of its inherent `encode` method delegates on every *Msg type (HasherMsg, MemoryMsg, BitwiseMsg, BlockStackMsg, BlockHashMsg, OpGroupMsg, KernelRomMsg, AceInitMsg, RangeMsg, LogCapacityMsg, AceWireMsg, MemoryResponseMsg, KernelRomResponseMsg, BitwiseResponseMsg). - `impl_logup_message!` macro + all 15 invocations. - `struct SiblingMsg` (replaced by `SiblingMsgBitZero` / `SiblingMsgBitOne` in the new bus path). - `struct OverflowMsg` + its LookupMessage impl (stack overflow bus is not emitted by the current 7-column LogUp argument). - `KernelRomMsg::INIT_LABEL` const and `KernelRomMsg::init` constructor (legacy naming from before the header-as-builder pattern). - Four legacy per-label memory response variants (MemoryResponseReadElementMsg, MemoryResponseWriteElementMsg, MemoryResponseReadWordMsg, MemoryResponseWriteWordMsg), replaced by the runtime-muxed MemoryResponseMsg. - Two legacy per-op kernel ROM response variants (KernelRomResponseCallMsg, KernelRomResponseInitMsg), replaced by the muxed KernelRomResponseMsg. - Two legacy per-op bitwise response variants (BitwiseResponseAndMsg, BitwiseResponseXorMsg), replaced by the muxed BitwiseResponseMsg. Drops the now-unused `bus_types` import (all surviving code uses `super::lookup::bus_id::BUS_*` directly). File shrinks 700 lines. `cargo check -p miden-air` finishes with zero warnings (down from 18). `miden_lookup_air_degree_within_budget` and `miden_lookup_air_cached_encoding_equivalence` both pass unchanged. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Demote 3 orphan doc comments (above section banners, documenting nothing) to plain comments to silence clippy::empty_line_after_doc_comments, and let rustfmt collapse empty `impl<E> Msg<E> {}` blocks and trim trailing newlines. No behavioural change. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Wires the prover-side LogUp aux-trace collection phase onto a single flat buffer shared across all rows and columns, ready to feed a fused batch-inversion + partial-sum pass as follow-up. LookupAir trait gains `column_shape() -> &[usize]` — the mutual-exclusion-aware upper bound on fractions any one row can push per column. Each bus emitter declares its own `MAX_INTERACTIONS_PER_ROW` const alongside the `emit_*` fn, with a comment explaining the max over its mutually exclusive branches: M1 (block_stack + range_table) = 3 M_2+5 (block_hash_queue ∪ op_group) = 7 M3 (chiplet_requests) = 4 M4 (range_stack + logpre_capacity) = 4 C1 (chiplet_responses) = 1 C2 (hash_kernel) = 1 C3 (ace_wiring) = 3 Main/Chiplet/Miden LookupAirs expose the per-column shapes via static arrays assembled from the per-emitter consts. New `fractions.rs` introduces `LookupFractions<F, EF>`: a single flat `fractions: Vec<(F, EF)>` and a single flat `counts: Vec<usize>` of length `num_rows * num_cols`, row-major. No padding. Row r's contribution to column c is `fractions[prefix .. prefix + counts[r * num_cols + c]]`, where `prefix` is the running sum of earlier counts. `Vec::with_capacity` sizes everything up front (`num_rows * Σ shape` for fractions, `num_rows * num_cols` for counts) so the hot row loop never re-allocates. `accumulate_slow` is the reference oracle — one `try_inverse()` per fraction, walks the flat buffers with one cursor and `counts.chunks(num_cols)` in lockstep, which matches the memory-access pattern the fused fast path will use. ProverLookupBuilder refactored to own `&mut LookupFractions`. `column(f)` split-borrows `fractions` and `counts` disjointly, snapshots the vec length before/after the closure, debug-asserts `pushed <= shape[col]`, and appends the count. ProverColumn/Group/Batch push logic is unchanged — they still thread a `&mut Vec<(F, EF)>` and call `.push((m, v))`. Tests: two unit tests in `fractions.rs` (handcrafted `accumulate_slow` round-trip + capacity reservation), one end-to-end smoke test in `prover.rs` using a custom `SmokeAir` (2 columns, handcrafted eval body that respects its own shape on every row — random row-data trips the mutex-aware shape check on the real MidenLookupAir, so the smoke test avoids that path). Out of scope: fused batch-inversion + partial-sum fast path, rayon row-loop parallelism. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Adds `build_lookup_fractions`, the top-level prover-side entry point that walks a real `RowMajorMatrix<Felt>` through `ProverLookupBuilder` and returns a populated `LookupFractions` buffer. Per row it builds a zero-copy two-row `RowWindow` over the flat matrix storage with wraparound on the last row, composes the per-row periodic slice by indexing each periodic column at `r mod its period` (reusing a single `Vec<Felt>` across rows — no per-iteration allocation), and runs `air.eval(&mut ProverLookupBuilder)`. The `for<'a> A: LookupAir<...>` HRTB lets one call site work with any `'a` the row loop needs. Public API surface for downstream crates: - `miden_air::lookup` — new re-export module exposing `LookupChallenges`, `LookupFractions`, `MidenLookupAir`, `ProverLookupBuilder`, `accumulate_slow`, and `build_lookup_fractions`. - `MidenLookupAir` relaxed from `pub(crate)` to `pub`. - `constraints` stays private; only the items re-exported through `lookup` leak. Integration test in `processor/src/trace/tests/lookup.rs`: runs a 6-op basic block (`Pad/Add/Pad/Mul/Drop/...`) through `build_trace_from_ops`, pipes the materialized main trace into `build_lookup_fractions` + `accumulate_slow`, and asserts shape bookkeeping, non-degenerate collection, and zero-panic accumulation. Observed on the tiny span: columns M1, M4, C2, C3 close to zero; columns M_2+5, M3, C1 don't close individually because they're paired across different columns in their respective buses. Terminal values are printed to stderr for manual inspection — hardening into `assert_eq!` checks is deferred until the follow-up round-trip oracle lands. Not yet covered by the test: symbolic AIR-constraint validation via `miden_crypto::stark::debug::*`. That needs a full aux-column matrix in the legacy shape, which sits behind the still-pending fused batch-inversion + partial-sum fast path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Adds two independent computations of the per-row aux-column delta `Σ m_i · d_i^{-1}` and a test that asserts they agree bit-exactly on a real program's trace, closing the loop from `build_lookup_fractions` through to the constraint-path algebra. **Fused accumulator** (`fractions.rs::accumulate`): replaces the per-fraction `accumulate_slow` with a single-inversion Montgomery-trick path via `miden_core::field::batch_multiplicative_inverse` (already chunked internally for ILP). One field inversion amortized over the whole trace instead of N. Memory-access pattern is otherwise identical to `accumulate_slow`, so both paths can coexist as cross-checks. Unit test `accumulate_matches_accumulate_slow_random` drives a deterministic LCG to build a random fixture (32 rows × shape [2,1,3]) and asserts `accumulate` produces output bit-exactly equal to `accumulate_slow` per (col, row). **Constraint-path oracle** (`column_oracle.rs`, new ~340 lines): `ColumnOracleBuilder` implements `LookupBuilder` by running the same `(U_g, V_g)` per-group algebra used by `ConstraintLookupBuilder` / `DualGroup`, and folds groups into per-column `(U_col, V_col)` via the standard cross-multiplication rule from `ConstraintColumn::fold_group`: V_col_new = V_col · U_g + V_g · U_col U_col_new = U_col · U_g starting from `(ONE, ZERO)`. The paired `collect_column_oracle_folds` driver walks a `RowMajorMatrix<Felt>` row by row (mirroring `build_lookup_fractions`) and returns per-row folded pairs for downstream comparison. `OracleColumn` / `OracleGroup` / `OracleBatch` mirror the `DualGroup` / `DualBatch` math verbatim, with `OracleGroup` also implementing `EncodedLookupGroup` so the cached-encoding path type-aligns (only the canonical closure runs — cached-encoding equivalence is already covered by `miden_lookup_air_cached_encoding_equivalence`). **Cross-check integration test** (`processor/src/trace/tests/lookup.rs::build_lookup_fractions_matches_constraint_path_oracle`): runs the same Pad/Add/Pad/Mul/Drop span as the plumbing smoke test through both paths and asserts `aux[col][r+1] - aux[col][r] == V_col · U_col^{-1}` for every (row, col). Catches divergence anywhere in the pipeline — bus emitters, flag handling, message encoding, collection, batched inversion, running-sum accumulation, constraint-path group fold. On the tiny span the prover path and the oracle agree bit-exactly across all 7 columns and every row. **Public API**: `miden_air::lookup` re-exports `accumulate`, `ColumnOracleBuilder`, `collect_column_oracle_folds`, and `LookupAir` alongside the existing collection-phase surface. Out of scope (follow-up milestones): - Wiring `MidenLookupAir::eval` into `ProcessorAir::eval` and flipping `AUX_TRACE_WIDTH` to retire the legacy multiset path. - End-to-end prove/verify test against a real program. Verification: `cargo test -p miden-air --lib constraints::lookup` (6/6), `cargo test -p miden-processor --lib trace::tests::lookup` (2/2), `make test-fast` 2890 passed, `make lint` clean, `cargo clippy -p miden-air -p miden-processor --all-targets` zero warnings. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replaces `fractions.rs::accumulate` with a row-major, chunk-parallel, fused-inversion variant. The old path ran three disjoint full sweeps (extract denominators, global `batch_multiplicative_inverse`, per-column walk) into a fragmented `Vec<Vec<EF>>`, with the accumulation walk fully sequential even under `concurrent`. On realistic traces the walk is the dominant wall-clock cost, so parallelizing it was the real win. **New shape**: one contiguous `RowMajorMatrix<EF>` of `(num_rows + 1) * num_cols` elements. Row 0 is the zero initial condition; row `r + 1` column `c` holds the running sum of `m_i · d_i⁻¹` through main-trace row `r`. Within a row all columns live in adjacent memory, so the per-row inner loop is a tight sequential write. **Algorithm**: 1. *Prepass* — sequential O(num_rows · num_cols) `usize` scan builds `row_frac_offsets` so any row range `[lo, hi)` can locate its flat fraction slice in O(1). 2. *Phase 1 — chunked fused walk* (parallel under `concurrent`, else serial): rows are split into groups of `ACCUMULATE_ROWS_PER_CHUNK = 512` (mirroring `trace::main_trace::ROW_MAJOR_CHUNK_SIZE`). Each chunk runs an inline Montgomery inversion into a chunk-local scratch `Vec` via the new `invert_denoms_in_place` helper — forward prefix-product pass, one real `try_inverse`, backward sweep that overwrites scratch with per-element inverses — then walks its rows forward maintaining a `num_cols`-wide running register, writing local prefix sums (starting from ZERO) directly into its output slice and emitting per-column totals. 3. *Phase 2 — sequential exclusive prefix scan* over `num_chunks · num_cols` chunk totals produces the per-chunk global offset each local prefix needs. Cheap (summary data only, no inversion) and the single serial-dependency point in the whole algorithm. 4. *Phase 3 — parallel offset add* (parallel under `concurrent`): second pass over the same row-chunked output slices in lockstep with `chunk_offsets`, adding the per-column offset into every row. Chunk 0 is skipped (zero offset). Memory-bandwidth limited, trivially parallel, no allocation, no inversion. Because partial sums are additive (not multiplicative), phases 2 and 3 collapse to a prefix scan + vector add instead of the global rescale a product-prefix formulation would need. **Allocation discipline**: - Output: one `Vec<EF>` of length `(num_rows + 1) * num_cols`. - Per-chunk scratch `Vec<EF>` sized to that chunk's fraction count (~1.5 K EFs ≈ 24 KiB at the default tuning, L1-resident). Allocated once per chunk inside the closure; rayon's work stealing amortizes across threads. - `row_frac_offsets` / `chunk_totals` / `chunk_offsets`: small summary buffers, `usize` or `num_chunks · num_cols` EFs. - No more `denoms: Vec<EF>` or `d_inv: Vec<EF>` intermediates, and no call into `batch_multiplicative_inverse` (which allocates a fresh `Vec` and internally parallelizes, which would nest poorly inside our own `par_chunks_mut`). **Feature gating**: follows `air/src/trace/main_trace.rs::to_row_major` exactly — `#[cfg(feature = "concurrent")]` swaps `par_chunks_mut` for `chunks_mut`, both phase 1 and phase 3 bodies are the same. Serial fallback is still chunked + fused + bounded-scratch, so strictly better than the old global-Vec path even without rayon. `accumulate_slow` is unchanged — it stays the `Vec<Vec<EF>>` reference oracle. **Tests** (`air/src/constraints/lookup/fractions.rs`): - `accumulate_matches_accumulate_slow_random` — single-chunk random fixture (32 rows), exercises only phase 1. - `accumulate_multi_chunk_matches_accumulate_slow` *(new)* — random fixture at `ACCUMULATE_ROWS_PER_CHUNK * 3 + 7` rows, deliberately picking a trailing short chunk so phase 2's prefix scan and phase 3's offset add are both exercised and any off-by-one in last-chunk bounds is caught. This is the key regression test — the single-chunk path can't cover phases 2 or 3. - `accumulate_empty_trace` *(new)* — `num_rows = 0` returns a 1-row, `num_cols`-wide zero matrix without touching the inversion path. - Shared `Lcg` / `random_fixture` / `assert_matrix_matches_slow` helpers factored out of the original single-chunk test. `processor/src/trace/tests/lookup.rs::build_lookup_fractions_matches_constraint_path_oracle` is migrated to read deltas from the matrix (`aux.values[(r + 1) * width + col] - aux.values[r * width + col]`) instead of the old `aux[col][r + 1] - aux[col][r]` Vec-of-Vec indexing. **Out of scope** (follow-up milestones from commit 64de7a9): - Wiring the matrix output into `ProcessorAir::eval` and flipping `AUX_TRACE_WIDTH` to retire the legacy multiset path. Verification: `cargo test -p miden-air --lib constraints::lookup` (8/8, includes `accumulate_multi_chunk_matches_accumulate_slow`); `cargo test -p miden-processor --lib trace::tests::lookup` (2/2); `make test-fast` 2892/2892; `cargo clippy -p miden-air --all-targets` zero warnings with and without `--features concurrent`; `make lint` clean. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Wire ProcessorAir through the closure-based MidenLookupAir/LogUp lookup argument by introducing a stateless `MidenLookupAuxBuilder` ZST and swapping the legacy multiset `AuxTraceBuilders` out of `prove_stark`. The new builder runs `build_lookup_fractions` + `accumulate` per call and slices the (num_rows + 1)-row accumulator into the AuxBuilder return shape (aux trace = first num_rows rows starting at ZERO; committed finals = last row holding the full running sum). Periodic columns are recomputed from `PeriodicCols::periodic_columns()` and challenges are constructed via `LookupChallenges::new(α, β)`, mirroring the constraint-path adapter exactly. ProcessorAir::eval now invokes `MidenLookupAir.eval` through a `ConstraintLookupBuilder` instead of the legacy `enforce_bus`, and `aux_width()` / `num_aux_values()` report the new 7-column LogUp width. Milestone B intentionally stubs three things to land the integration plumbing without touching the open-bus algebra: - LogUp boundary checks (when_first_row / when_last_row) are commented out in `ConstraintColumn::column`; the per-row transition constraint Δ·U − V = 0 stays live. - `ProcessorAir::reduced_aux_values` returns the verifier identity (prod = ONE, sum = ZERO) — the only safe stub since the verifier asserts `is_identity()` on the accumulated reduction. - `ace::reduced_aux_batch_config()` returns an empty config, collapsing the batched ACE circuit to `constraint_check + γ·(1−1) + γ²·0 = constraint_check` (verified `batch_reduced_aux_values` handles the empty case cleanly). `ExecutionTrace::check_constraints` is rewired to the new builder so the debug constraint checker stops feeding the AIR an 8-column legacy aux trace, and `test_poseidon2_prove_verify` now skips the recursive verify step (the recursive verifier MASM still depends on the legacy boundary terms; restoration is the follow-up milestone). `RELATION_DIGEST` is regenerated alongside `constraints_eval.masm` so the snapshot test and AIR↔MASM transcript binding stay consistent. Verified with the cross-check oracle test, full miden-air + miden-processor lib suites, and `test_blake3_256_prove_verify` end-to-end. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Rip out ~3,000 lines of legacy multiset bus / aux trace code now that `MidenLookupAuxBuilder` and `MidenLookupAir` are wired through `prove_stark` and `ProcessorAir::eval`. Air side: - Delete `air/src/constraints/{decoder,stack,range,chiplets/bus*}.rs` — the per-component bus constraint files. Drop `pub mod bus` declarations from each parent module and the orphaned `air/src/constraints/bus.rs` index. - Delete `enforce_bus`, `enforce_bus_first_row`, `enforce_bus_last_row` from `air/src/constraints/mod.rs`. - Delete the orphaned `air/src/trace/aux_trace.rs` (an unused dependency-inversion trait that never had a real implementor). - Collapse the legacy `AUX_TRACE_*_OFFSET` / `AUX_TRACE_WIDTH = 8` layout in `air/src/trace/mod.rs` down to `AUX_TRACE_WIDTH = LOGUP_AUX_TRACE_WIDTH (= 7)`. Strip dead column-index constants (`P1_COL_IDX`, `P2_COL_IDX`, `P3_COL_IDX`, `B_RANGE_COL_IDX`, hasher `P1_COL_IDX`) from the per-component trace modules. - Delete the legacy `AuxCols<T>` struct + `NUM_AUX_COLS` + `AUX_COL_MAP` + `aux_col_map` test in `air/src/constraints/columns.rs`. Remove the unused `F_5` / `F_6` field constants left behind by the bus deletion. Processor side: - Delete the four legacy aux trace builders: `processor/src/trace/{decoder,stack,range,chiplets}/aux_trace*` (decoder block stack/hash/op group tables, stack overflow, range checker, chiplets hash kernel / bitwise / memory / ACE / virtual table / wiring bus) - Delete `processor/src/debug.rs` (`BusMessage` trait + `BusDebugger` struct, only used by the deleted aux builders) and the dead `AuxColumnBuilder` trait in `processor/src/trace/utils.rs`. - Delete `AuxTraceBuilders` struct, its `build_aux_columns` impl, and its `AuxBuilder` trait impl from `processor/src/trace/mod.rs`. Drop the field, accessor, and constructor parameter from `ExecutionTrace`. Delete the orphaned `ExecutionTrace::build_aux_trace` helper. - Strip the dead `AceHints` / `EvaluatedCircuitsMetadata` infrastructure from `processor/src/trace/chiplets/ace/mod.rs` (the wiring-bus metadata + divisor-construction loop only existed for the legacy ACE wiring bus aux builder). The ACE chiplet `fill_trace` is now a pure trace filler with no side-channel return, and runs in the same parallel rayon scope as the other chiplets. - Delete the legacy `processor/src/trace/tests/{decoder,stack,range, hasher,chiplets/}` aux trace builder tests. The cross-check oracle test in `processor/src/trace/tests/lookup.rs` stays — it remains the algebraic safety net for the LogUp collection pipeline. Verified with `cargo check --workspace --all-targets`, the cross-check oracle test, full miden-air + miden-processor lib suites (46 + 1466 passing), and `test_blake3_256_prove_verify` end-to-end. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Sweep up the leftover dead code from the multiset deletion: a duplicate `build_op_group` helper in the decoder test surface (shadowed by a local copy in `core_trace_fragment/tests.rs`), the unused `build_span_with_respan_ops` and `build_trace_from_ops_with_inputs` test helpers, and a handful of imports left behind in `parallel/tests`, `trace/utils`, and `trace/tests/mod.rs` after the legacy aux trace tests were removed. The aux-trace-builder comparison block in `parallel/tests::test_run_with_fragments_matches_single_fragment` is also dropped since `ExecutionTrace::build_aux_trace` no longer exists and the deterministic-trace snapshot still covers the same equality. Workspace is now warning-clean apart from the pre-existing unused `mstream` / `pipe` op-flag getter warning in `air/src/constraints/op_flags`, which is unrelated to this milestone. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Start re-populating the `processor/src/trace/tests/` tree that commit 974cde5 cleared alongside the legacy multiset bus. Introduces a shared `LookupHarness` that wraps `build_lookup_fractions` + `accumulate` with per-row delta and terminal accessors, keyed by `aux_col::*` column constants. Tests hand-construct expected `LookupMessage` instances via the `logup_msg` structs and assert `aux[r+1][col] - aux[r][col]` matches `-1 / msg.encode(&challenges)` at each request row — a strictly stronger successor to the old `b_chip[r]` step-by-step walk. `chiplets/bitwise.rs` is the representative restoration, covering U32and/U32xor on `CHIPLET_REQUESTS` (M3). Layer 3 (column closure) is intentionally deferred: the `diagnostic_multi_batch_terminals` test in `tests/lookup.rs` documents that columns M_2+5, M3, and C1 carry open boundary contributions until the LogUp boundary wiring lands. Air-side: re-exports `LookupMessage` + the `logup_msg` module from the previously-internal `constraints::lookup` / `constraints::logup_msg` paths so the processor tests can encode messages without leaking other internals. Test helpers `build_span_with_respan_ops` and `build_trace_from_ops_with_inputs` are restored as well since the follow-up decoder / hasher / memory / range / sibling-table restorations depend on them. A `TODO(logup-overflow)` note in `trace/tests/mod.rs` points at `git show 974cde5^:processor/src/trace/tests/stack.rs` for the deferred stack-overflow-table test. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Adds the two missing buses to MidenLookupAir, completing the migration of legacy multiset arguments to LogUp: - M5 / BUS_STACK_OVERFLOW_TABLE — three interactions on (clk, val, prev): right_shift add, left_shift∧overflow remove, dyncall∧overflow remove (DYNCALL uses hasher_state[5] for the new overflow pointer since b1' is reset by the call). Matches the legacy stack overflow bus exactly. - BUS_HASHER_PERM_LINK on the shared C3 v_wiring column — binds hasher controller rows to permutation sub-chiplet rows via four interactions on (label, state[0..12]): controller input/output (+1, label 0/1), perm row 0 / row 15 (-m, label 0/1) where m is read from PermutationCols.multiplicity. Closes the structural gap that let the permutation segment run independently from the controller. Sharing v_wiring with ACE wiring is sound because the two buses use distinct bus_prefix[bus] additive bases — their fractions are linearly independent in the extension field and cannot interfere on the shared running accumulator. Renames emit_ace_wiring → emit_v_wiring and bumps the column's MAX_INTERACTIONS_PER_ROW from 3 to 4. ChipletActiveFlags gains a permutation field (= s_perm = perm_seg) so the perm-link emitter can gate its perm-row interactions without reaching past the chiplet active-flag abstraction. LOGUP_AUX_TRACE_WIDTH grows 7 → 8 (one new M5 column; the perm-link adds no new column thanks to the v_wiring share). MAIN_COLUMN_SHAPE grows to 5 entries; MIDEN_COLUMN_SHAPE to 8. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Restore the LogUp boundary identity check on top of the Milestone B stateless `MidenLookupAuxBuilder` integration, so `prove_verify` round- trips close algebraically rather than by stub. Three threads of work: **LogUp boundary constraints.** Re-enable the `when_first_row` and `when_transition` assertions inside `ConstraintColumn::column` (the transition was already active; the first-row `acc == 0` is new). The symmetric `when_last_row` (binding `aux[N-1]` to the committed final) stays commented out — it exceeds the degree-9 budget on M_2+5 even though it mirrors the transition shape; soundness of the verifier-side boundary check now relies on the committed finals being honestly derived from the aux trace, to be tightened in a follow-up. **`ProcessorAir::reduced_aux_values`.** Restore the real boundary formula: `sum = Σ aux_finals[col] + total_correction` where `total_correction` cancels the three open buses: - `c_block_hash = +1 / encode(BLOCK_HASH_TABLE, [0, ph[0..4], 0, 0])` — the decoder's END on the root block emits one unmatched `remove`. - `c_log_precompile = 1/d_initial − 1/d_final` — the LOGPRECOMPILE transcript chain telescopes to one unmatched remove (initial state) and one unmatched add (final state). - `c_kernel_rom = −Σ 1 / d_kernel_proc_msg_i over VLPI[0]` — the kernel ROM chiplet emits an `add` per kernel proc init, matched by the verifier via variable-length public inputs. The four helper functions (`program_hash_message`, `transcript_messages`, `kernel_proc_message`, `kernel_logup_correction_from_var_len`) come out of their `#[allow(dead_code)]` wrapper and get live callers again. **Bus closure bug fixes** (found via parallel planning sessions that added `BUSDBG` instrumentation to the prover and a Python groupby-d analysis): - Memory chiplet word-index range checks (`w0`, `w1`, `w1<<2` from `memory/mod.rs:291`) now have consumers on M4 via `block_stack_and_range_logcap.rs`, closing the M1↔M4 range_check gap. The file merges the legacy `block_stack.rs` + `range_logcap.rs` emitters into one so the range-check response and request sides live together. - The RESPAN request in `chiplet_requests.rs` used `addr_next - 1` for the hasher parent; fixed to `addr_next` directly (matches the controller/perm split — the next row's decoder `addr` already points at the continuation input row). - MSTREAM and PIPE chiplet bus requests were missing entirely; added two-word memory reads/writes and dropped the `TODO` / `dead_code` markers on `op_flags::{mstream,pipe}`. **Width bump + test updates.** `LOGUP_AUX_TRACE_WIDTH` is 8 (5 main + 3 chiplet); test harnesses and shape assertions in `processor/src/trace/tests/lookup.rs`, `lookup_harness.rs`, and the bitwise chiplet bus test get the new value imported from `miden_air`. The `aux_col` module in `lookup_harness.rs` collapses `HASHER_PERM_LINK` into the shared `V_WIRING` slot (col 7), matching the 2b81a86c1 decision to ride the perm-link bus on the ACE-wiring column via distinct `bus_prefix` bases. **Snapshot regeneration.** `RELATION_DIGEST` in `config.rs` and `mod.masm`, plus the insta snapshot and `constraints_eval.masm`, are regenerated to reflect the new constraint system (`num_eval_gates: 5520`). Verified on `test_blake3_256_prove_verify` end-to-end (debug print shows `residual = [0, 0]`), the full non-recursive `prove_verify` suite (10 / 10 passing across Blake3/Keccak/Rpo/Poseidon2/Rpx + both proptests + fast-parallel variants), the cross-check oracle, and `miden-air --lib` (including the snapshot test and the degree-budget test). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Two small CI fixups surfaced by the `check all feature combinations` and `Build Documentation` jobs on `adr1anh/bus`: - `air/src/constraints/lookup/miden_air.rs::SeededRng::next_felt` used `miden_crypto::rand::test_utils::prng_value`, which lives behind a feature flag not enabled under `cargo check --all-targets --no-default-features --manifest-path air/Cargo.toml`. Replaced the call with an inline SplitMix64 mixer on `(seed, counter)` — still reproducible across runs, still sufficient for the cached-encoding equivalence test, no external dep on miden-crypto's test-utils. - `docs/src/design/bus_constraint_inventory.md` lines 993 and 1034 had bare-text formulas `deg(...) <= D_max` and `current_column_degree + cost(...) <= D_max`. Docusaurus' MDX parser sees the `<` and starts looking for a JSX tag, then chokes on the bare `=`. Wrapped both in inline backticks so MDX treats them as code and ignores the `<`. Verified with `cargo check -p miden-air --all-targets --no-default-features` (clean) and `cargo test -p miden-air --lib miden_lookup_air_cached_encoding_equivalence` (passes with the new PRNG).

Move the four internal design docs authored during the all-LogUp bus refactor out of `docs/src/design/` — they're working notes, not user-facing documentation, and Docusaurus auto-indexes the folder so they'd end up shipped on the site if left in place: - `bus_api_design_notes.md` — closure-based `LookupBuilder` API rationale and trait-stack layout sketch. - `bus_constraint_inventory.md` — per-bus constraint audit from the legacy multiset model, plus the column-packing degree-budget worksheet that drove the `MidenLookupAir` shape. - `bus_packing_summary.md` — distilled packing decisions per column. - `lookup_air_plan.md` — task-by-task execution log for the refactor. The files live on locally under `~/.claude/projects/-Users-adrian-Developer-miden-vm/memory/pr_bus_notes/` for future reference; this commit only removes them from the tree. Also drops the two dangling doc-comment references to `docs/src/design/lookup_air_plan.md` in `air/src/constraints/lookup/{mod,builder}.rs`.

Replace the Milestone-B empty stub in `reduced_aux_batch_config` with a real LogUp boundary identity check in the ACE circuit DAG: 0 = Σ aux_bound[0..7] + c_block_hash + c_log_precompile + c_kernel_rom The two fixed-length corrections (block_hash, log_precompile) are rebuilt inside the DAG as rational fractions `(±1, d)` and folded into a running `(N, D)` pair — no in-circuit inversion needed. The kernel-ROM correction is computed in MASM via `ext2inv + ext2add` accumulation and handed in as a scalar via `VlpiReduction(0)`. The final boundary check is the quadratic identity `(Σ aux_bound + c_kr) · D + N = 0`, batched with γ against the constraint evaluation. Key changes: - **air/src/ace.rs**: new `LogUpBoundaryConfig` / `batch_logup_boundary` / `logup_boundary_config` replacing the old `ReducedAuxBatchConfig` stub. Reuses existing `encode_bus_message` for in-DAG message construction. - **crates/ace-codegen/src/circuit.rs**: encoder now appends a trailing `add(root, 0)` when the DAG root isn't the last emitted op, fixing a Rust/MASM evaluator disagreement (the ACE chiplet reads "last wire" as output, not a logical root pointer). - **crates/lib/core/asm/sys/vm/public_inputs.masm**: `reduce_kernel_digests` changed from multiplicative product (`ext2mul`) to sum-of-reciprocals (`ext2inv + ext2add + ext2neg`), matching `kernel_logup_correction_from_var_len` in the Rust verifier. - Regenerated `constraints_eval.masm`, `RELATION_DIGEST`, and insta snapshot. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Merge the three encoding-primitive methods (beta_powers, bus_prefix, insert_encoded) from the separate EncodedLookupGroup trait into LookupGroup with panicking default implementations. This simplifies the trait hierarchy by removing the EncodedGroup GAT from LookupColumn and using a single Group type for both closures of group_with_cached_encoding. The prover path now unconditionally panics on encoding-primitive calls (previously used debug_assert which was silent in release builds). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The block-hash bus boundary correction emits `Child { parent: 0, child_hash: program_hash }` while the in-trace removal at the root END row uses `End { parent, child_hash, is_first_child, is_loop_body }`. Per the PR author's algebra, the two collapse to identical denominators because at the root END row: - the next op is HALT, forcing addr_next = 0 ⇒ parent = addr_next = 0; - halt_next() = 1 ⇒ is_first_child = 1 - end_next - repeat_next - halt_next = 0; - the root block is not a loop body ⇒ is_loop_body = 0; - child_hash = h_0 = program_hash by the decoder's program-hash boundary. With those four substitutions, `End` and `Child` produce the same `(parent, child_hash, 0, 0)` payload tuple in `BlockHashMsg::encode`. Document the algebra inline at the boundary emission site and add a regression test that constructs both messages with the matching values and asserts their encoded denominators are equal. Refs: items.md A3, audit-report.md H-3 (PR #2962 close-out hardening) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The MASM recursive verifier hardcodes `gamma = beta^16` via 4 sequential squarings in `crates/lib/core/asm/sys/vm/public_inputs.masm:239-251`. If the Rust constant `MIDEN_MAX_MESSAGE_WIDTH` ever moves off 16, the MASM needs a matching change (e.g. 5 squarings for beta^32) — but until now nothing forced both to move together. Add a `const _: () = assert!(MIDEN_MAX_MESSAGE_WIDTH == 16)` tripwire that fails the build if the Rust side changes alone, with the MASM file path called out in the message so the next person knows where to look. Refs: items.md A20, audit-report.md L-5 (PR #2962 close-out hardening) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ecks Two single-line debug_assert additions flagged in the PR #2962 audit: - `Challenges::new` already asserted max_message_width > 0 but not num_bus_ids > 0; a zero-length bus_prefix table would panic with an obscure index-out-of-bounds on first encode. Add the matching assert. - `build_logup_aux_trace` reads challenges[0]/challenges[1] by index; a future short slice would panic with the same opaque message. Add a debug_assert with a clear message. Both are cheap defensive checks. No behavior change for correct input. Refs: items.md A6, A7 (PR #2962 close-out hardening) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Today only `BusId::HasherPermLinkOutput` (the high watermark) is pinned. That catches gaps but not reorders: swapping two variants with adjacent discriminants would silently swap which `bus_prefix[i]` each emitter resolves to, breaking domain separation across every bus consumer. Replace the single assertion with one `const _: () = assert!(...)` per BusId variant, pinning the whole layout. Any future reorder now fails at compile time at the line of the moved variant. Refs: items.md A14, audit-report.md L-4 (PR #2962 close-out hardening) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…inants The `bus: BusId` field on `MemoryMsg::Element` and `MemoryMsg::Word` is set by the typed constructors `read_element` / `write_element` / `read_word` / `write_word` to one of MemoryReadElement / MemoryWriteElement / MemoryReadWord / MemoryWriteWord. Until now, external callers could construct these variants directly with an arbitrary BusId — e.g. `MemoryMsg::Word { bus: BusId::Bitwise, ... }` — silently breaking the domain-separation guarantee. Rust enum variants don't allow per-field visibility, so the standard fix is `#[non_exhaustive]` on each variant: same-crate code can still match and construct freely (preserving today's call sites), but external callers are forced through the typed helpers. Refs: items.md A4, audit-report.md M-6 (PR #2962 close-out hardening) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The prover path short-circuits on `flag == F::ZERO`, while the constraint path evaluates the encode unconditionally. They agree only when `flag ∈ {0, 1}`. Every Miden bus emitter today drives `flag` as a product of decoder/op selectors that the AIR pins boolean, but the invariant is implicit — a future emitter that introduces an arithmetic flag (e.g. `s0 - s01` style subtractions producing a non-boolean intermediate) would silently fork prover and constraint paths. Add `debug_assert!(flag == F::ZERO || flag == F::ONE)` on both ProverGroup::insert and ProverGroup::batch so any future regression is caught at test time. Refs: items.md A5, audit-report.md M-8 (PR #2962 close-out hardening) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>