fix: add iteration cap to label_propagation to prevent infinite loop#1434
fix: add iteration cap to label_propagation to prevent infinite loop#1434octo-patch wants to merge 1 commit intogetzep:mainfrom
Conversation
…ixes getzep#1397) label_propagation used an unbounded while True loop that could oscillate forever on mid-sized graphs (~1000+ nodes) due to a tiebreaker rule that caused pairs of nodes to flip communities on every pass without converging. Changes: - Add MAX_LABEL_PROPAGATION_ITERATIONS = 200 safety cap - Replace while True with for loop using the iteration cap - Fix tiebreaker: prefer curr_community to avoid oscillation - Log debug on convergence, warning when cap is hit
|
+1 — adopting this patch as a monkey-patch in our production graphiti-core 0.28.2 deployment after hitting the bug on a bipartite supply-chain graph (details: #1397 (comment)). The two-line change (iteration cap + |
xkonjin
left a comment
There was a problem hiding this comment.
Review: fix: add iteration cap to label_propagation to prevent infinite loop
Overall: Excellent safety fix. Replaces an unbounded while True with a finite cap and adds tie-breaking logic to prevent oscillation — both are real convergence hazards in label propagation on cyclic graphs.
Positives:
MAX_LABEL_PROPAGATION_ITERATIONS = 200is a reasonable, non-magic default. Good to see it is module-level and easy to tune.- The tie-breaking change (
new_community = curr_communityinstead ofmax(community_candidate, curr_community)) directly addresses oscillation between equal-rank neighbors, which is a classic label propagation failure mode. - Logging is well-placed: debug on convergence, warning on cap-hit. This helps operators distinguish "fast community detection" from "degraded approximate labels."
Potential issues:
- 200 may be too low for large graphs: If a projection has hundreds of nodes with long dependency chains, 200 iterations might not converge. Consider exposing
max_iterationsas a parameter tolabel_propagation()so callers can tune it, or document the chosen value in a docstring. - Convergence semantics change: Previously, the function would loop until absolute stability. Now it returns approximate labels after 200 iterations. Downstream consumers (community summarization, search pruning) should be aware that labels may be unstable under the cap. The warning log helps, but a docstring note on the function itself would be stronger.
- No test for cap-hit path: The test coverage appears to be manual. Consider adding a unit test that constructs a deliberately oscillating graph (e.g., a 2-node cycle with symmetric edge weights) and asserts:
- The function returns within the cap.
- The warning is emitted.
- Labels do not oscillate indefinitely.
Security: No issues. This is a pure algorithmic safety bound.
Verdict: Good fix, ready to merge. Recommend adding a docstring and an oscillation test in a follow-up.
Fixes #1397
Problem
label_propagationincommunity_operations.pyused an unboundedwhile Trueloop that could oscillate forever on mid-sized graphs (~1000+ nodes). The issue reporter observed theirbuild_communities()call hanging for 44 minutes at 100% CPU before they killed it.The root cause is the tiebreaker rule: when a node's neighbor count is tied, the code did
new_community = max(community_candidate, curr_community), which creates a systematic bias towards higher-numbered communities. Combined with deterministic iteration order, this causes pairs of nodes to flip communities on every pass without ever converging.Solution
Two complementary changes:
Add an iteration cap (
MAX_LABEL_PROPAGATION_ITERATIONS = 200): Replaces thewhile Trueloop withfor iteration in range(MAX_LABEL_PROPAGATION_ITERATIONS). Real-world graphs converge in <20 iterations; the 200-iteration cap is a safety ceiling that produces approximate community labels instead of hanging indefinitely.Fix the tiebreaker: Prefer
curr_communitywhen tied instead ofmax(community_candidate, curr_community). This matches the reference label-propagation algorithm specification and short-circuits oscillation. When a node has no clear plurality neighbor community, keeping its current assignment is stable.A
logger.debugmessage is emitted on convergence; alogger.warningis emitted if the cap is hit.Testing
The fix is a logic correction with a clear behavioral improvement: