Skip to content

feat: marf squash engine#7060

Open
francesco-stacks wants to merge 39 commits into
stacks-network:developfrom
francesco-stacks:feat/marf-squash-engine
Open

feat: marf squash engine#7060
francesco-stacks wants to merge 39 commits into
stacks-network:developfrom
francesco-stacks:feat/marf-squash-engine

Conversation

@francesco-stacks
Copy link
Copy Markdown
Contributor

Description

Builds on top of #7057. Will be removed from Draft once that one will be merged

  • Add MARF::squash_to_path(): offline squash pipeline that DFS-collects nodes into a disk-backed NodeStore, remaps pointers, recomputes hashes, and streams a single shared trie blob
  • Make get_root_hash_at(), get_block_height_miner_tip(), and get_trie_root_ancestor_hashes_bytes() squash-aware so blocks inside the squashed range return correct per-height values from SQL side-tables

Applicable issues

  • fixes #

Additional info (benefits, drawbacks, caveats)

Checklist

  • Test coverage for new or modified code paths
  • For new Clarity features or consensus changes, add property tests (see docs/property-testing.md)
  • Changelog fragment(s) or "no changelog" label added (see changelog.d/README.md)
  • Required documentation changes (e.g., rpc/openapi.yaml for RPC endpoints, event-dispatcher.md for new events)
  • New clarity functions have corresponding PR in clarity-benchmarking repo

@francesco-stacks francesco-stacks self-assigned this Mar 30, 2026
@coveralls
Copy link
Copy Markdown

coveralls commented Mar 31, 2026

Coverage Report for CI Build 25803408516

Coverage increased (+0.08%) to 85.795%

Details

  • Coverage increased (+0.08%) from the base build.
  • Patch coverage: 201 uncovered changes across 9 files (2204 of 2405 lines covered, 91.64%).
  • 5287 coverage regressions across 102 files.

Uncovered Changes

File Changed Covered %
stackslib/src/chainstate/stacks/index/squash.rs 484 423 87.4%
stackslib/src/chainstate/stacks/index/test/squash.rs 1216 1165 95.81%
stackslib/src/chainstate/stacks/index/squash/node_store.rs 251 230 91.63%
stackslib/src/chainstate/stacks/index/squash/stream.rs 154 134 87.01%
stackslib/src/chainstate/stacks/index/trie.rs 41 23 56.1%
stackslib/src/chainstate/stacks/index/marf.rs 53 40 75.47%
stackslib/src/chainstate/stacks/index/mod.rs 8 0 0.0%
stackslib/src/chainstate/stacks/index/trie_sql.rs 25 20 80.0%
stackslib/src/chainstate/stacks/index/storage.rs 52 48 92.31%

Coverage Regressions

5287 previously-covered lines in 102 files lost coverage.

Top 10 Files by Coverage Loss Lines Losing Coverage Coverage
stackslib/src/config/mod.rs 358 69.04%
stackslib/src/chainstate/stacks/miner.rs 233 83.63%
stackslib/src/chainstate/stacks/index/storage.rs 229 82.08%
stackslib/src/net/inv/epoch2x.rs 226 79.21%
stackslib/src/chainstate/stacks/db/transactions.rs 203 97.13%
stackslib/src/net/chat.rs 201 93.01%
stackslib/src/chainstate/stacks/db/mod.rs 196 86.23%
stackslib/src/core/mempool.rs 170 86.87%
stackslib/src/chainstate/stacks/index/node.rs 160 87.35%
stacks-node/src/nakamoto_node/miner.rs 156 86.82%

Coverage Stats

Coverage Status
Relevant Lines: 222005
Covered Lines: 190469
Line Coverage: 85.79%
Coverage Strength: 18786690.79 hits per line

💛 - Coveralls

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements the MARF “squash engine” to create disk-backed squashed snapshots and makes key MARF/trie lookup paths squash-aware so callers can get correct per-height root hashes and block heights when reading within the squashed range.

Changes:

  • Add MARF::squash_to_path() offline squashing pipeline (DFS node collection into a disk-backed NodeStore, pointer remap, hash recompute, and streaming a single trie blob).
  • Update trie/MARF lookup logic to consult squash SQL side-tables for correct block heights and per-height root hashes in squashed ranges.
  • Add extensive unit/integration tests covering squashing, extension behavior, pointer encoding, and patch/backpointer edge cases.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
stackslib/src/chainstate/stacks/index/trie.rs Make ancestor-hash collection squash-aware by reading block heights/root hashes from squash SQL side-tables where appropriate.
stackslib/src/chainstate/stacks/index/storage.rs Make get_root_hash_at() consult squash metadata so historical blocks inside the squash range return the correct archival root hash.
stackslib/src/chainstate/stacks/index/squash.rs Add the new squash engine implementation (NodeStore, remap/hash recompute, blob streaming, metadata persistence).
stackslib/src/chainstate/stacks/index/mod.rs Export the new squash module.
stackslib/src/chainstate/stacks/index/marf.rs Re-export squash constants/stats and make get_block_height_miner_tip() squash-aware via side-table lookups.
stackslib/src/chainstate/stacks/index/test/squash.rs Add comprehensive squash/extend regression tests and targeted unit tests for disk-backed mechanisms.
stackslib/src/chainstate/stacks/index/test/node.rs Add tests for inline back-block payload in compressed pointers and back_block preservation behavior.
stackslib/src/chainstate/stacks/index/test/node_patch.rs Add regression test ensuring patch application preserves inline back_block payload identity.
stackslib/src/chainstate/stacks/index/test/mod.rs Register the new squash test module.
stackslib/src/chainstate/stacks/index/test/marf.rs Rename/expose the MARF setup helper for reuse by squash tests; adjust one callback-propagation test to use smaller fixtures.
changelog.d/marf-squash-engine.added Changelog entry for squash engine + squash-aware lookups.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread stackslib/src/chainstate/stacks/index/squash.rs Outdated
Comment thread stackslib/src/chainstate/stacks/index/squash.rs Outdated
Comment thread stackslib/src/chainstate/stacks/index/squash.rs Outdated
Comment thread stackslib/src/chainstate/stacks/index/storage.rs Outdated
Comment thread stackslib/src/chainstate/stacks/index/marf.rs Outdated
@francesco-stacks francesco-stacks marked this pull request as ready for review May 11, 2026 12:56
Copy link
Copy Markdown
Contributor

@federico-stacks federico-stacks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a first pass and left some minors/nits around.

Comment thread stackslib/src/chainstate/stacks/index/test/squash.rs Outdated
Comment thread stackslib/src/chainstate/stacks/index/squash.rs Outdated
Comment thread stackslib/src/chainstate/stacks/index/test/squash.rs Outdated
Comment thread stackslib/src/chainstate/stacks/index/test/squash.rs Outdated
Comment thread stackslib/src/chainstate/stacks/index/test/squash.rs Outdated
Comment thread stackslib/src/chainstate/stacks/index/test/squash.rs Outdated
Comment thread stackslib/src/chainstate/stacks/index/test/squash.rs Outdated
Comment thread stackslib/src/chainstate/stacks/index/test/squash.rs Outdated
Comment on lines +86 to +100
fn remap_child_ptrs(
store: &mut NodeStore,
source_to_idx: &HashMap<(u32, u64), usize>,
block_id_map: &HashMap<u32, u32>,
label: &str,
) -> Result<(), Error> {
let remap_start = Instant::now();
let node_count = store.len();
let mut reader = store.open_reader()?;

let write_file = std::fs::OpenOptions::new()
.write(true)
.open(&store.path)
.map_err(Error::IOError)?;
let mut writer = BufWriter::with_capacity(1 << 20, write_file);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some food for thoughts: at the point we enter remap_child_ptrs there are 3 (if I'm not mistaken 2 writer and 1 reader) OS file handles open against same file simultaneosly. Not a problem today based on how the operations (sequence) are performed.

I'm wondering if could be worth it encapsulate the write inside the NodeStore to just have 1 writer handle in the system. Something like this:

    /// Overwrite the node at `idx` in place. Call only after `finish_writing`.
    pub fn overwrite_node(&mut self, idx: usize, node: &TrieNodeType) -> Result<(), Error> {
        let offset = *self.file_offsets.get(idx).ok_or_else(|| {
            Error::CorruptionError(format!("overwrite_node: index {idx} out of bounds"))
        })?;
        self.writer.seek(SeekFrom::Start(offset)).map_err(Error::IOError)?;
        serialize_node(&mut self.writer, node)
    }

   pub(crate) fn flush(&mut self) ....

and to be used in remap_child_ptrs like this:

if modified {
    store.overwrite_node(idx, &node)?;
}
// ...
store.flush()?;

This way the second writer can be dropped with one owner of the write access.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed in 6e175b8

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in the huddle, possibly we could try to encapsulate also the read logic within the NodeStore if node reads are confirmed to be "stateless"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread stackslib/src/chainstate/stacks/index/squash.rs

/// Rewrite inline child pointers from in-memory node indices to blob-local
/// byte offsets. Backpointers and empty pointers are left untouched.
pub fn update_inline_child_ptrs(ptrs: &mut [TriePtr], file_offsets: &[u64]) -> Result<(), Error> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not to start bikeshedding, but ... any good idea for a name for this function that makes it more obvious what's happening? "Update" is pretty generic and meaningless, especially considering how consequential the data modification is that it performs.

(I know you didn't actually add this function here, you just moved it)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed in e820a0e

// height H for every block in the squashed range. Use the
// side-table when available.
if storage.is_squashed() {
if let Some(h) = trie_sql::read_squash_block_height(storage.sqlite_conn(), block_hash)?
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that in the vast majority of cases, this function is going to becalled with the chain tip or very close to it.

If that's true, wouldn't it be better if we tried the MARF first, and only fall back to SQL later?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is a valid point. In normal operations that will be the case. we'll mostly read from the squash during clarity lookups, block replay, and the RPC endpoints in general. Anyway I did the change , it was a bit more involved than I expected, because the sql lookup first removed a couple of edge cases 70a1aa1

Comment on lines +2514 to +2515
if let Some(h) = trie_sql::read_squash_block_height(self.sqlite_conn(), tip)? {
return trie_sql::read_squash_archival_marf_root_hash(self.sqlite_conn(), h)?
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes two SQL queries that are fulfilled from the same row -- first SELECT height WHERE hash, then SELECT root_hash WHERE height.

Feels like that should be a single SELECT root_hash WHERE hash?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added read_squashed_block_root_hash_by_hash in 20739d632

// we want to find the block at a given _height_. but how to do so?
// use the data stored already in the MARF.
let cur_block_height =
MARF::get_block_height_miner_tip(storage, &cur_block_header, &cur_block_header)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You updated get_block_height_miner_tip to handle squashed MARFs. Given that, why is this change here necessary?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, yeah redundant db15da7


let root_ptr = storage.root_trieptr();
let ancestor_hash = storage.read_node_hash_bytes(&root_ptr)?;
let ancestor_height = cur_block_height - (1u32 << log_depth);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(note to self: have a closer look at this change)

francesco-stacks added a commit to francesco-stacks/stacks-core that referenced this pull request May 13, 2026
Comment thread stackslib/src/chainstate/stacks/index/squash.rs Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants