feat: marf squash engine by francesco-stacks · Pull Request #7060 · stacks-network/stacks-core

francesco-stacks · 2026-03-30T23:20:46Z

Description

Builds on top of #7057. Will be removed from Draft once that one will be merged

Add MARF::squash_to_path(): offline squash pipeline that DFS-collects nodes into a disk-backed NodeStore, remaps pointers, recomputes hashes, and streams a single shared trie blob
Make get_root_hash_at(), get_block_height_miner_tip(), and get_trie_root_ancestor_hashes_bytes() squash-aware so blocks inside the squashed range return correct per-height values from SQL side-tables

Applicable issues

fixes #

Additional info (benefits, drawbacks, caveats)

Checklist

Test coverage for new or modified code paths
For new Clarity features or consensus changes, add property tests (see docs/property-testing.md)
Changelog fragment(s) or "no changelog" label added (see changelog.d/README.md)
Required documentation changes (e.g., rpc/openapi.yaml for RPC endpoints, event-dispatcher.md for new events)
New clarity functions have corresponding PR in clarity-benchmarking repo

coveralls · 2026-03-31T00:06:04Z

Coverage Report for CI Build 25803408516

Coverage increased (+0.08%) to 85.795%

Details

Coverage increased (+0.08%) from the base build.
Patch coverage: 201 uncovered changes across 9 files (2204 of 2405 lines covered, 91.64%).
5287 coverage regressions across 102 files.

Uncovered Changes

File	Changed	Covered	%
stackslib/src/chainstate/stacks/index/squash.rs	484	423	87.4%
stackslib/src/chainstate/stacks/index/test/squash.rs	1216	1165	95.81%
stackslib/src/chainstate/stacks/index/squash/node_store.rs	251	230	91.63%
stackslib/src/chainstate/stacks/index/squash/stream.rs	154	134	87.01%
stackslib/src/chainstate/stacks/index/trie.rs	41	23	56.1%
stackslib/src/chainstate/stacks/index/marf.rs	53	40	75.47%
stackslib/src/chainstate/stacks/index/mod.rs	8	0	0.0%
stackslib/src/chainstate/stacks/index/trie_sql.rs	25	20	80.0%
stackslib/src/chainstate/stacks/index/storage.rs	52	48	92.31%

Coverage Regressions

5287 previously-covered lines in 102 files lost coverage.

Top 10 Files by Coverage Loss	Lines Losing Coverage	Coverage
stackslib/src/config/mod.rs	358	69.04%
stackslib/src/chainstate/stacks/miner.rs	233	83.63%
stackslib/src/chainstate/stacks/index/storage.rs	229	82.08%
stackslib/src/net/inv/epoch2x.rs	226	79.21%
stackslib/src/chainstate/stacks/db/transactions.rs	203	97.13%
stackslib/src/net/chat.rs	201	93.01%
stackslib/src/chainstate/stacks/db/mod.rs	196	86.23%
stackslib/src/core/mempool.rs	170	86.87%
stackslib/src/chainstate/stacks/index/node.rs	160	87.35%
stacks-node/src/nakamoto_node/miner.rs	156	86.82%

Coverage Stats


Relevant Lines:	222005
Covered Lines:	190469
Line Coverage:	85.79%
Coverage Strength:	18786690.79 hits per line

💛 - Coveralls

Copilot

Pull request overview

Implements the MARF “squash engine” to create disk-backed squashed snapshots and makes key MARF/trie lookup paths squash-aware so callers can get correct per-height root hashes and block heights when reading within the squashed range.

Changes:

Add MARF::squash_to_path() offline squashing pipeline (DFS node collection into a disk-backed NodeStore, pointer remap, hash recompute, and streaming a single trie blob).
Update trie/MARF lookup logic to consult squash SQL side-tables for correct block heights and per-height root hashes in squashed ranges.
Add extensive unit/integration tests covering squashing, extension behavior, pointer encoding, and patch/backpointer edge cases.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
stackslib/src/chainstate/stacks/index/trie.rs	Make ancestor-hash collection squash-aware by reading block heights/root hashes from squash SQL side-tables where appropriate.
stackslib/src/chainstate/stacks/index/storage.rs	Make `get_root_hash_at()` consult squash metadata so historical blocks inside the squash range return the correct archival root hash.
stackslib/src/chainstate/stacks/index/squash.rs	Add the new squash engine implementation (NodeStore, remap/hash recompute, blob streaming, metadata persistence).
stackslib/src/chainstate/stacks/index/mod.rs	Export the new `squash` module.
stackslib/src/chainstate/stacks/index/marf.rs	Re-export squash constants/stats and make `get_block_height_miner_tip()` squash-aware via side-table lookups.
stackslib/src/chainstate/stacks/index/test/squash.rs	Add comprehensive squash/extend regression tests and targeted unit tests for disk-backed mechanisms.
stackslib/src/chainstate/stacks/index/test/node.rs	Add tests for inline back-block payload in compressed pointers and back_block preservation behavior.
stackslib/src/chainstate/stacks/index/test/node_patch.rs	Add regression test ensuring patch application preserves inline back_block payload identity.
stackslib/src/chainstate/stacks/index/test/mod.rs	Register the new `squash` test module.
stackslib/src/chainstate/stacks/index/test/marf.rs	Rename/expose the MARF setup helper for reuse by squash tests; adjust one callback-propagation test to use smaller fixtures.
changelog.d/marf-squash-engine.added	Changelog entry for squash engine + squash-aware lookups.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

federico-stacks

I did a first pass and left some minors/nits around.

federico-stacks · 2026-05-12T11:01:47Z

+fn remap_child_ptrs(
+    store: &mut NodeStore,
+    source_to_idx: &HashMap<(u32, u64), usize>,
+    block_id_map: &HashMap<u32, u32>,
+    label: &str,
+) -> Result<(), Error> {
+    let remap_start = Instant::now();
+    let node_count = store.len();
+    let mut reader = store.open_reader()?;
+
+    let write_file = std::fs::OpenOptions::new()
+        .write(true)
+        .open(&store.path)
+        .map_err(Error::IOError)?;
+    let mut writer = BufWriter::with_capacity(1 << 20, write_file);


some food for thoughts: at the point we enter remap_child_ptrs there are 3 (if I'm not mistaken 2 writer and 1 reader) OS file handles open against same file simultaneosly. Not a problem today based on how the operations (sequence) are performed.

I'm wondering if could be worth it encapsulate the write inside the NodeStore to just have 1 writer handle in the system. Something like this:

/// Overwrite the node at `idx` in place. Call only after `finish_writing`. pub fn overwrite_node(&mut self, idx: usize, node: &TrieNodeType) -> Result<(), Error> { let offset = *self.file_offsets.get(idx).ok_or_else(|| { Error::CorruptionError(format!("overwrite_node: index {idx} out of bounds")) })?; self.writer.seek(SeekFrom::Start(offset)).map_err(Error::IOError)?; serialize_node(&mut self.writer, node) } pub(crate) fn flush(&mut self) ....

and to be used in remap_child_ptrs like this:

if modified { store.overwrite_node(idx, &node)?; } // ... store.flush()?;

This way the second writer can be dropped with one owner of the write access.

changed in 6e175b8

As discussed in the huddle, possibly we could try to encapsulate also the read logic within the NodeStore if node reads are confirmed to be "stateless"

benjamin-stacks · 2026-05-12T09:26:17Z

+
+/// Rewrite inline child pointers from in-memory node indices to blob-local
+/// byte offsets. Backpointers and empty pointers are left untouched.
+pub fn update_inline_child_ptrs(ptrs: &mut [TriePtr], file_offsets: &[u64]) -> Result<(), Error> {


Not to start bikeshedding, but ... any good idea for a name for this function that makes it more obvious what's happening? "Update" is pretty generic and meaningless, especially considering how consequential the data modification is that it performs.

(I know you didn't actually add this function here, you just moved it)

changed in e820a0e

benjamin-stacks · 2026-05-12T11:13:02Z

+        // height H for every block in the squashed range.  Use the
+        // side-table when available.
+        if storage.is_squashed() {
+            if let Some(h) = trie_sql::read_squash_block_height(storage.sqlite_conn(), block_hash)?


I assume that in the vast majority of cases, this function is going to becalled with the chain tip or very close to it.

If that's true, wouldn't it be better if we tried the MARF first, and only fall back to SQL later?

I believe this is a valid point. In normal operations that will be the case. we'll mostly read from the squash during clarity lookups, block replay, and the RPC endpoints in general. Anyway I did the change , it was a bit more involved than I expected, because the sql lookup first removed a couple of edge cases 70a1aa1

benjamin-stacks · 2026-05-12T11:31:10Z

+            if let Some(h) = trie_sql::read_squash_block_height(self.sqlite_conn(), tip)? {
+                return trie_sql::read_squash_archival_marf_root_hash(self.sqlite_conn(), h)?


This makes two SQL queries that are fulfilled from the same row -- first SELECT height WHERE hash, then SELECT root_hash WHERE height.

Feels like that should be a single SELECT root_hash WHERE hash?

I added read_squashed_block_root_hash_by_hash in 20739d632

benjamin-stacks · 2026-05-12T11:37:37Z

        //   we want to find the block at a given _height_. but how to do so?
        //   use the data stored already in the MARF.
-        let cur_block_height =
-            MARF::get_block_height_miner_tip(storage, &cur_block_header, &cur_block_header)


You updated get_block_height_miner_tip to handle squashed MARFs. Given that, why is this change here necessary?

nice, yeah redundant db15da7

benjamin-stacks · 2026-05-12T11:39:05Z

-
-            let root_ptr = storage.root_trieptr();
-            let ancestor_hash = storage.read_node_hash_bytes(&root_ptr)?;
+            let ancestor_height = cur_block_height - (1u32 << log_depth);


(note to self: have a closer look at this change)

…mes to the read_squashed_* functions

add squash logic and squash-aware trie lookups

7f94df2

francesco-stacks self-assigned this Mar 30, 2026

francesco-stacks added 4 commits March 31, 2026 09:54

Merge branch 'feat/marf-squash-foundation' into feat/marf-squash-engine

04f3aad

Merge branch 'feat/marf-squash-foundation' into feat/marf-squash-engine

d142bf5

Merge branch 'feat/marf-squash-foundation' into feat/marf-squash-engine

60e19dc

add changelog fragment

3ac2225

francesco-stacks mentioned this pull request Apr 2, 2026

feat: common "snapshotting" code to copy canonical side-table data + index and SPVcopy #7067

Draft

5 tasks

francesco-stacks added 2 commits April 27, 2026 14:21

Merge branch 'feat/marf-squash-foundation' into feat/marf-squash-engine

a4007b4

Merge branch 'upstream-develop' into feat/marf-squash-engine

eb62b95

francesco-stacks requested a review from Copilot May 8, 2026 08:21

Copilot started reviewing on behalf of francesco-stacks May 8, 2026 08:22 View session

Copilot AI reviewed May 8, 2026

View reviewed changes

francesco-stacks added 17 commits May 8, 2026 17:51

refactor: stream blobs child-before-parent like dump_consume

8eb6275

refactor: remove unused code

5db9691

use is_squashed where possible

8b8832d

use is_inline_child_ptr where possible

70eafc0

merge update_inline_child_ptrs and remap_ptrs_to_blob_offsets

cb393af

feat: disable proofs for squashed marfs

d38f8bd

feat: disable blocks lookups before squash height

50d601b

error if output marf db or blobs already exist

56cba88

varius nits and improved tests

23ba282

rename InlineOnlyBlockMap -> BackptrFreeBlockMap

ce22a2a

fix: atomically commit squash finalization in a single sql transaction

0d24e87

remove compute_blob_offsets; stream_squash_blob computes its own offsets

4e6dcad

reject leaf iteration below squash height

c207c61

build NodeStore temp paths with PathBuf

67a8ffa

track writer offsets without per-node stream_position

6eb0265

split MARF squash internals

88d8a06

Merge branch 'develop' into feat/marf-squash-engine

06a84b3

francesco-stacks marked this pull request as ready for review May 11, 2026 12:56

francesco-stacks requested review from benjamin-stacks, cylewitruk-stacks and federico-stacks May 11, 2026 12:57

federico-stacks reviewed May 12, 2026

View reviewed changes

benjamin-stacks reviewed May 12, 2026

View reviewed changes

francesco-stacks added 10 commits May 12, 2026 15:43

crc: clear db and blobs on failure. improve logged info

f1736df

crc: user a single writer

6e175b8

avoid copying hash when possible

71ed7da

various nits

a642873

crc: rename update_inline_child_ptrs -> resolve_inline_child_offsets

e820a0e

simplify some gates for squashed marf

a7e5652

add read_squashed_block_root_hash_by_hash and give more consistent na…

20739d6

…mes to the read_squashed_* functions

crc: query Trie before SQL side tables

70a1aa1

crc: remove redundant squash guard

db15da7

clippy

35f9d01

francesco-stacks requested review from benjamin-stacks and federico-stacks May 12, 2026 20:40

francesco-stacks added a commit to francesco-stacks/stacks-core that referenced this pull request May 13, 2026

Re-merge PR stacks-network#7060 head

adce07a

francesco-stacks added no changelog and removed no changelog labels May 13, 2026

federico-stacks reviewed May 13, 2026

View reviewed changes

Comment thread stackslib/src/chainstate/stacks/index/squash.rs Outdated

francesco-stacks added 4 commits May 13, 2026 16:15

crc: add cleanup log on error

cd0a433

crc: internalize reader into NodeStore

13186f1

Merge branch 'develop' into feat/marf-squash-engine

2289f6b

crc: re-add comment

fcb3f3a

francesco-stacks requested a review from federico-stacks May 13, 2026 13:50

crc: remove old retry and open Reader at creation time

820e32e

		if let Some(h) = trie_sql::read_squash_block_height(self.sqlite_conn(), tip)? {
		return trie_sql::read_squash_archival_marf_root_hash(self.sqlite_conn(), h)?

Conversation

francesco-stacks commented Mar 30, 2026

Description

Applicable issues

Additional info (benefits, drawbacks, caveats)

Checklist

Uh oh!

coveralls commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Coverage Report for CI Build 25803408516

Coverage increased (+0.08%) to 85.795%

Details

Uncovered Changes

Coverage Regressions

Coverage Stats

💛 - Coveralls

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

federico-stacks left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

coveralls commented Mar 31, 2026 •

edited

Loading