perf: encode shared constraints with an auxiliary variable per version set#236
Open
baszalmstra wants to merge 1 commit into
Open
Conversation
…n set Previously every (parent, version set) constraint pair emitted one binary clause per candidate that does not match the version set. Since the non-matching candidate list is identical for every parent that shares a constraint, the same clauses were re-emitted over and over, accounting for the vast majority of clauses in large conda-style solves. Constraints whose version set excludes at least 4 candidates now share a single auxiliary variable per version set that means "a candidate excluded by this version set is installed": - (NOT candidate OR aux) is emitted once per version set for every excluded candidate, and - (NOT parent OR NOT aux) is emitted once per (parent, version set) pair. This reduces the number of constrains clauses from O(parents * candidates) to O(parents + candidates). Only one implication polarity is needed (the Plaisted-Greenbaum one-sided form) because the auxiliary variable is never decided directly, it only receives a value through propagation. Conflict reporting reconstructs the original parent -> candidate edges by pairing the two clause kinds through their shared auxiliary variable, so rendered conflict messages are identical to the pairwise encoding. Constraints that exclude fewer than 4 candidates keep the pairwise encoding, which is smaller and propagates in a single hop for such short candidate lists. https://claude.ai/code/session_01YQWW7mgsodYKjadEsWqPij
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This pull request replaces the pairwise
Constrainsclause encoding with a shared auxiliary-variable encoding. Previously every (parent, version set) constraint pair emitted one binary clause per candidate excluded by the version set, so the same excluded-candidate list was re-emitted for every parent sharing a constraint. On conda-forge style workloads these clauses dominate the clause database.Key Changes
Constraints whose version set excludes at least 4 candidates now share one auxiliary variable per version set, meaning "a candidate excluded by this version set is installed":
ConstrainsExcluded:(¬candidate ∨ aux), emitted once per version set;ConstrainsParent:(¬parent ∨ ¬aux), one clause per (parent, version set) pair.This reduces the constrains clause count from O(parents × candidates) to O(parents + candidates). Constraints excluding fewer than 4 candidates keep the pairwise encoding, which is smaller and propagates in a single hop for short lists.
Performance Results
Benchmarked on the conda-forge dataset (linux-64 + noarch snapshot, 1000 randomized problems per seed, two seeds, identical problem sequences on both sides, clean release builds, 60s timeout):
The gains extend the solvable frontier: main hits the 60s timeout on 6 problems, this PR on 3 (a strict subset), so 3 problems become solvable that previously were not. The largest single-problem improvement is 42.8s (a main timeout solved in 17.2s); the worst regression is +10.8s (40.9s vs 30.1s, both completing).
Not a single problem of the 2000 uses more clauses or memory with the new encoding. Search effort drops as well: 1.39× fewer conflicts and 1.27× fewer clause visits in aggregate, and the smaller database also makes encoding 1.32× faster.
Technical Rationale
Propagation strength is preserved, in two hops instead of one:
parent = truepropagatesaux = falsewhich propagates¬candidatefor every excluded candidate, and converselycandidate = truepropagatesaux = truewhich propagates¬parent. Only one implication polarity is needed because the auxiliary variable is never decided directly; it only takes values through propagation. This is the one-sided (polarity-aware) variant of the Tseytin transformation, due to Plaisted and Greenbaum (1986).Conflict reporting reconstructs the original
parent → candidateedges by pairing the two clause kinds through their shared auxiliary variable, so rendered conflict messages are identical to the pairwise encoding on the entire existing test suite.Correctness
New clause-count shape tests assert the M + N encoding (and N × M for the pairwise small case), and new unsat snapshot tests cover the shared encoding in forward, reverse, and multi-parent conflict directions; their snapshots are byte-identical to the pairwise encoding's output.