Skip to content

feat: add scorer-kit#20

Open
ovitrif wants to merge 13 commits into
synonymdev:mainfrom
ovitrif:codex/scorer-kit
Open

feat: add scorer-kit#20
ovitrif wants to merge 13 commits into
synonymdev:mainfrom
ovitrif:codex/scorer-kit

Conversation

@ovitrif
Copy link
Copy Markdown

@ovitrif ovitrif commented May 16, 2026

Context

This PR moves the offline scorer-file tooling previously introduced via synonymdev/ldk-node#79 into the scorer repo as scorer-kit and extends it for local/operator workflows around serialized LDK ChannelLiquidities files.

The tool can inspect scorer binaries, decode them to JSON, compare two snapshots, validate bytes, derive selected-channel allowlists, and write new scorer binaries. It uses the scorer diagnostics/merge/filter API from the rust-lightning feat/scorer-diagnostics branch, where synonymdev/rust-lightning#3 was merged, via an aliased dependency, so the existing scorer node/prober dependency graph remains on its current LDK crate set.

What Changed

  • Adds src/bin/scorer-kit.rs with inspect, decode, compare, merge, node-scids, and validate subcommands.
  • Adds policy-driven duplicate handling for merge: richer-history, prefer-first, prefer-last, combine, and newer.
  • Adds selected-channel overlay support via --overlay-scid / --overlay-scids-file: the first input remains the baseline without decay, later inputs are filtered to explicitly listed short-channel-ids before merge policy is applied.
  • Adds node-scids for deriving an overlay SCID allowlist from a serialized LDK NetworkGraph, one or more node pubkeys, and an optional incoming scorer-file intersection.
  • Allows node-scids --invoice <bolt11> to recover the target pubkey from an invoice when a direct node pubkey is not already available.
  • Adds merge audit reports with duplicate decisions, overlay filter counts, and summary counts.
  • Keeps generated scorer artifacts out of version control; local score snapshots, decoded JSON, reports, and merged binaries should be generated under target/scorer-kit/ or another ignored path.
  • Documents common scorer-kit flows in the README.

Current Sample Result

Using the current two source samples locally, the broad richer-history merge produced a 16,857-entry scorer file. The audit report recorded 1,226 duplicate short-channel-id decisions: 399 kept from the existing input, 827 replaced by the incoming input, and 0 combined.

That broad merge remains useful for analysis, but the safer production-style workflow is now the selected overlay path: start with source-z as the baseline, pass source-b as the incoming overlay, use --policy prefer-last, and provide an explicit SCID allowlist with --overlay-scids-file.

A local selected overlay candidate generated with a 20-SCID allowlist reduced the 8,896-entry incoming file to 5 score entries before merge, replaced 3 overlapping entries, preserved 9,184 overlapping baseline entries without decay, and added 2 unique incoming entries. The resulting target/scorer-kit/merged.bin has 9,189 entries.

Sample Mapping

The local target/scorer-kit/merged.bin candidate uses source-z as the baseline sample and source-b as the incoming overlay sample. source-z is the current production scorer sample from https://api.blocktank.to/scorer-prod and remains intact except for explicitly selected overlay replacements. source-b is the previous production scorer sample from https://api.blocktank.to/scorer-prod-old; the Mullvad-related channels are selected from this sample only.

For the local candidate, the Mullvad overlay is represented by the 20-SCID allowlist passed to --overlay-scids-file. That allowlist reduces source-b to 5 scorer entries before merge; those 5 entries are the only incoming entries used from source-b. The merge then replaces 3 overlapping source-z entries, preserves the other 9,184 overlapping entries without decay, and adds 2 unique entries, producing the 9,189-entry merged.bin.

This means the generated merged.bin is not a broad replacement of source-z with source-b; it is source-z plus the selected Mullvad-related overlay from source-b, and the overlay path does not normalize the rest of the baseline scorer.

Notes

Scorer binaries are keyed by short-channel-id and do not contain node pubkeys. Node-level selection therefore needs a serialized LDK network graph, an invoice payee, or another external channel map to produce the SCID allowlist before running scorer-kit merge.

Generated scorer binaries and decoded JSON are intentionally not committed. The branch keeps the tooling and regeneration flow, not a snapshot of network-dependent scorer state.

Test Plan

  • cargo check --bin scorer-kit
  • cargo test --bin scorer-kit (8 scorer-kit unit tests)
  • cargo test
  • scorer-kit node-scids --help
  • scorer-kit validate target/scorer-kit/merged.bin
  • scorer-kit compare <source-z.bin> target/scorer-kit/merged.bin --left-label source-z --right-label merged --output json => 9,184 equal overlaps, 3 replaced overlaps, 2 right-only entries
  • scorer-kit validate <source-z.bin> <source-b.bin>
  • scorer-kit inspect <source-z.bin> --label source-z --top 5
  • scorer-kit inspect <source-b.bin> --label source-b --top 5
  • scorer-kit compare <source-z.bin> <source-b.bin> --left-label source-z --right-label source-b
  • scorer-kit merge <source-z.bin> <source-b.bin> --label source-z --label source-b --output target/scorer-kit/merged-richer-history.bin --report target/scorer-kit/merged-richer-history.report.json --policy richer-history
  • scorer-kit merge <source-z.bin> <source-b.bin> --label source-z --label source-b --policy prefer-last --overlay-scid <unique-scid> --output target/scorer-kit/merged-one-overlay.bin --report target/scorer-kit/merged-one-overlay.report.json
  • scorer-kit merge <source-z.bin> <source-b.bin> --label source-z --label source-b --policy prefer-last --overlay-scid <overlapping-scid> --output target/scorer-kit/merged-one-overlap.bin --report target/scorer-kit/merged-one-overlap.report.json
  • scorer-kit merge <source-z.bin> <source-b.bin> --label source-z --label source-b --policy prefer-last --overlay-scids-file <20-scid-allowlist> --output target/scorer-kit/merged.bin --report target/scorer-kit/merged.report.json
  • sensitive term scan over committed files; PR body intentionally documents the sample URL mapping and Mullvad overlay source

@ovitrif ovitrif marked this pull request as ready for review May 18, 2026 00:40
@ovitrif
Copy link
Copy Markdown
Author

ovitrif commented May 18, 2026

@dzdidi This is for you to review, I can't request you as reviewer until you add me as maintainer.

Please do so I don't have to submit changes via forks, and my reviews of your changes allow merging the PRs in the future 🙏🏻

@dzdidi dzdidi self-requested a review May 18, 2026 06:26
@dzdidi
Copy link
Copy Markdown
Collaborator

dzdidi commented May 18, 2026

@ovitrif added myself as a reviewer, do not have permission to add you, check with @BitcoinErrorLog please

Regarding the code itself, what is the value of adding /data folder into version control?

@ovitrif
Copy link
Copy Markdown
Author

ovitrif commented May 18, 2026

@ovitrif added myself as a reviewer, do not have permission to add you, check with @BitcoinErrorLog please

Regarding the code itself, what is the value of adding /data folder into version control?

Thanks, will ask John.
So the data folder contains new scores binary which ideally would represent the latest known snapshot (by code) that we track; plus the metadata for auditing in the future -- hence also adding it decoded as json.

@dzdidi
Copy link
Copy Markdown
Collaborator

dzdidi commented May 18, 2026

The content of these files is directly correlated with the network conditions and has everything to do with data and nothing to do with logic. To me it is the same as committing database into repo.

@ovitrif
Copy link
Copy Markdown
Author

ovitrif commented May 18, 2026

@dzdidi Agree on the principle, but I don't fully agree in practice. I see a reason to have an overview how the scores evolved, for example, over time.

So, a DB does hold state, but that state is specific to an app, kinda isolated. A scores db feels more like a snapshot of our service's understanding of an external thing that evolves "publicly", the LN graph.

Still, not gonna defend those necessarily, just arguing for the sake of it. I could happily ignore those from git.

@ovitrif
Copy link
Copy Markdown
Author

ovitrif commented May 18, 2026

The content of these files is directly correlated with the network conditions and has everything to do with data and nothing to do with logic. To me it is the same as committing database into repo.

Resolved in 222ae9d: removed the committed data/scores artifacts, ignored local /data/, and updated the PR/README to keep generated scorer outputs under ignored local paths like target/scorer-kit/.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants