fix: prevent updates of RestartInProgress condition be covered by updating role status#88
fix: prevent updates of RestartInProgress condition be covered by updating role status#88ShirleyDing wants to merge 1 commit into
Conversation
… covering updates from other reconcile logic (sgl-project#87) Signed-off-by: dingxy <yingd1206@gmail.com> Co-authored-by: dingxy <yingd1206@gmail.com>
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Pull Request Overview
This PR adds a refresh of the RoleBasedGroup object before updating its status to ensure the latest version is being modified, preventing potential stale data issues in status updates.
- Adds a
Getcall to refresh therbgobject immediately before status modification - Ensures status updates are applied to the most current version of the object
- Improves reliability of status updates in the RoleBasedGroup reconciliation loop
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
cheyang
left a comment
There was a problem hiding this comment.
Review — PR #88
Status: needs-work (rebase required)
Merge Conflict
This PR currently has merge conflicts against main. Given it's been open since November 2025, the base branch has diverged significantly. A rebase is needed before this can proceed.
Code Review
The fix itself is reasonable. The race condition is clear: updateRBGStatus operates on a potentially stale rbg object, so when it patches the status back, it can overwrite the RestartInProgress condition that was updated by the restart-policy reconciler in a concurrent reconcile loop. Re-fetching with r.client.Get(...) right before setCondition ensures the conditions slice reflects the latest server state.
One subtlety worth noting: readyCondition is computed before the Get(), based on the roleStatuses argument. If another reconciler modified the RBG between the initial read and this re-fetch, the ready calculation could be momentarily stale. In practice this is fine — the next reconcile loop will correct it — but it's worth being aware of.
Requested Changes
- Rebase onto current
mainand resolve conflicts. - Consider adding a brief inline comment above the
Get()call explaining why it's needed (e.g.,// Re-fetch to avoid overwriting conditions set by other reconcilers). Future readers will thank you. - The PR description checks the "tests added" box but the diff contains no test changes. If there's a unit test covering this race condition, please include it. If not feasible to unit-test (timing-dependent), that's understandable — just clarify in the description.
Once rebased and CI passes green again, this should be good to merge.
Reviewed at sha 702c81896be1269f0ead81237cfd82d2461572a4
cheyang
left a comment
There was a problem hiding this comment.
Updated Review — PR #88
Status: obsolete — recommend closing
Superseded by upstream changes
On closer inspection, this PR has been fully superseded by subsequent work on main:
-
PR #324 (merged 2026-05-06) —
fix(controller): prevent RestartInProgress condition loss due to stale cache race— provided a more comprehensive fix for the exact same bug, using a non-caching API reader (apiReader) in thePodReconcilerand a defense-in-depth check inupdateRBGStatus. -
PR #340 (merged 2026-05-19) —
chore(rbg): deprecate and remove RecreateRBGOnPodRestart restart policy— completely removed theRecreateRBGOnPodRestartfeature, thePodReconciler,pod_controller.go, and theRestartInProgresscondition type. There are zero references toRestartInProgresson currentmain.
Since the feature this PR fixes no longer exists, rebasing would not be meaningful. This PR should be closed rather than rebased.
Thank you for the original bug report and fix idea — the race condition analysis in #87 was accurate and valuable. It informed the fix that eventually landed in #324.
Reviewed at sha 702c81896be1269f0ead81237cfd82d2461572a4 against main @ fa3804c
Ⅰ. Motivation
Ⅱ. Modifications
Ⅲ. Does this pull request fix one issue?
fixes #87
Ⅳ. List the added test cases (unit test/integration test) if any, please explain if no tests are needed.
Ⅴ. Describe how to verify it
VI. Special notes for reviews
Set logLevel=1 and look at "patch content" in pkg/utils/utils.go:30 PatchObjectApplyConfiguration:

After all pod restarted, the condition of RestartInProgress was set to RBGRestartCompleted
Then in reconcile cycle of RoleBasedGroup, when updateRoleStatus, it used an outdated condition 'RBGRestart', not 'RBGRestartCompleted'. So the condition of RestartInProgress stayed in 'RBGRestart' forever. When pod restart next time, the restart-policy will not work anymore.

Checklist
make fmt.