fix(rbgsa-controller): guard empty selector in LWP follower roles#263
fix(rbgsa-controller): guard empty selector in LWP follower roles#263sebest wants to merge 1 commit into
Conversation
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
5a42c10 to
7f17ae5
Compare
When a RoleInstanceSet's status.labelSelector has not been populated yet (race between the RIS controller and the RBGSA controller), the code unconditionally prepended a comma before the component-index label, producing a malformed selector like ",rbg.x-k8s.io/component-index=0". This broke the kwota materializer's /scale subresource parsing, blocking all scaling for affected LeaderWorkerPattern workloads. Add an empty-selector guard that returns errSelectorNotReady (a sentinel error), causing the reconciler to requeue after 2s with an Info log and Warning event instead of writing a malformed selector to status. Hard errors (client.Get failures, bad apiVersion) still propagate normally. Also clean up extractLabelSelectorDefault: propagate ctx instead of context.TODO(), remove dead gvk variable, use %w error wrapping, and fix a typo in an error message.
7f17ae5 to
f7ed14a
Compare
|
@Syspretor I'm not sure the remaining failing test is caused by my PR and I cannot retry it. |
cheyang
left a comment
There was a problem hiding this comment.
Review Summary
The fix is straightforward and correct: guard the empty status.labelSelector for RIS + LWP roles so we never produce a malformed leading-comma selector. The sentinel error + requeue pattern is idiomatic for transient controller conditions, and the test coverage is solid (7 table-driven cases).
Blockers before merge:
- Merge conflict — this branch conflicts with
main. Please rebase. - E2E failure —
[v1alpha1] deploy with restartPolicytimed out (150s). The failure (workload generation not equal to observed generation) looks unrelated to your change, but CI must be green.
Code observations (non-blocking):
- The
pkgerrorsalias + stdliberrorssplit is clean. - Dead
gvkvariable removal,context.TODO()→ctx,%v→%w, and the typo fix are good housekeeping. - 2s requeue is reasonable for a condition gated on another controller's reconcile.
Once the conflict is resolved and CI passes, this looks good to go.
|
Thanks for the fix -- the empty-selector guard and sentinel error pattern look correct in isolation, and the test coverage is solid (7 table-driven cases covering both transient and hard error paths). However the PR currently has merge conflicts with |
Summary
status.labelSelectorwhen building the RBGSA selector for RoleInstanceSet + LeaderWorkerPattern roles",rbg.x-k8s.io/component-index=0"(leading comma), which breaks our materializer's/scalesubresource parsing and blocks all scalingerrSelectorNotReadysentinel error so the caller can distinguish the transient "selector not yet populated" case (requeue after 2s) from hard errors (propagate normally)SelectorNotReady) for observability of the retry loopextractLabelSelectorDefault: propagatectx, remove dead variable, use%wwrapping, fix typoTest plan
go test ./internal/controller/workloads/ -run TestExtractLabelSelectorDefault -v— 7 table-driven cases covering populated/empty/missing selector, standalone vs LWP, LWS, and workload-not-foundgo test ./internal/controller/workloads/— all existing tests pass (no regressions)go vet ./internal/controller/workloads/— clean