chore: add e2e cases for inplace update#285
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive suite of end-to-end tests for in-place updates, covering standalone and LeaderWorker patterns, various update strategies, grace periods, and role template references. The review feedback identifies several areas for improvement: correcting Gomega assertions to prevent potential runtime panics, enabling or documenting disabled test cases, removing redundant sleep statements to optimize test execution, and implementing proper error handling for JSON operations.
| gomega.Eventually(func() bool { | ||
| err := f.Client.List(f.Ctx, podList, client.InNamespace(f.Namespace), | ||
| client.MatchingLabels{ | ||
| "rbg.workloads.x-k8s.io/group-name": rbg.Name, | ||
| "rbg.workloads.x-k8s.io/role-name": "role-1", | ||
| }) | ||
| if err != nil || len(podList.Items) == 0 { | ||
| return false | ||
| } | ||
| return podList.Items[0].UID == initialPodUID && | ||
| podList.Items[0].Spec.Containers[0].Image == updatedImage | ||
| }, testutils.Timeout, testutils.Interval).Should(gomega.BeTrue(), | ||
| "Pod UID should remain unchanged for inplace update, initial: %s, got: %s, image changed from %s to %s", | ||
| initialPodUID, podList.Items[0].UID, initialImage, updatedImage) |
There was a problem hiding this comment.
This assertion will likely cause a runtime panic. In Go, arguments to a function call are evaluated before the function is executed. Here, podList.Items[0].UID is evaluated when Should() is called, which happens before Eventually starts polling. Since podList.Items is empty at that point, it will result in an index out of range panic.
A more robust and idiomatic way to write this in Gomega is to use the Eventually(func(g Gomega) { ... }) pattern, which allows making multiple assertions within the polling loop and provides better failure messages without manual formatting.
gomega.Eventually(func(g gomega.Gomega) {
err := f.Client.List(f.Ctx, podList, client.InNamespace(f.Namespace),
client.MatchingLabels{
"rbg.workloads.x-k8s.io/group-name": rbg.Name,
"rbg.workloads.x-k8s.io/role-name": "role-1",
})
g.Expect(err).ShouldNot(gomega.HaveOccurred())
g.Expect(podList.Items).Should(gomega.HaveLen(1))
g.Expect(podList.Items[0].UID).Should(gomega.Equal(initialPodUID))
g.Expect(podList.Items[0].Spec.Containers[0].Image).Should(gomega.Equal(updatedImage))
}, testutils.Timeout, testutils.Interval).Should(gomega.Succeed())| f.ExpectRbgV2Deleted(rbg) | ||
| }) | ||
|
|
||
| ginkgo.XIt("standalone pattern with RecreatePod should always recreate pod even for image-only update", func() { |
There was a problem hiding this comment.
This test case is disabled using ginkgo.XIt. If the functionality is implemented and ready for verification, please enable the test by changing it to ginkgo.It. If there are known issues or the feature is incomplete, it's helpful to add a comment or a TODO explaining why it is currently disabled. This applies to other occurrences of XIt in this file as well (lines 209, 262, 475).
| // Wait for some time to allow reconciliation | ||
| time.Sleep(5 * time.Second) |
There was a problem hiding this comment.
The time.Sleep(5 * time.Second) here appears to be redundant. The subsequent gomega.Consistently block (line 303) already polls for 10 seconds to verify that the pod remains unchanged. If the controller were to perform an update, Consistently would detect it regardless of this initial sleep. Removing unnecessary sleeps helps keep the E2E test suite fast and reliable.
| type patch struct { | ||
| Metadata metadata `json:"metadata"` | ||
| } | ||
| b, _ := json.Marshal(patch{Metadata: metadata{Labels: labels}}) |
There was a problem hiding this comment.
The error from json.Marshal is ignored. While it's unlikely to fail with this specific struct, it's better practice to handle the error to avoid silent failures if the struct definition changes in the future.
b, err := json.Marshal(patch{Metadata: metadata{Labels: labels}})
if err != nil {
return &runtime.RawExtension{}
}c280a6d to
d78abee
Compare
Signed-off-by: 玖宇 <guotongyu.gty@alibaba-inc.com>
d78abee to
50b836e
Compare
cheyang
left a comment
There was a problem hiding this comment.
Thanks for adding comprehensive e2e coverage for in-place update. The test structure is solid and follows existing patterns well.
Two things need attention before this can merge:
1. Merge conflict — the PR currently conflicts with the base branch. Please rebase.
2. CI failure (not flaky) — the leaderWorker pattern with multiple replicas should perform inplace update test times out at 150s (line ~567 in inplace_update.go). The in-place update for a 2-replica LeaderWorkerSet with size=2 (4 pods total) does not converge. This looks like a real bug in the test expectation or controller behavior rather than a timing fluke. Worth investigating whether the controller correctly handles in-place updates across multiple replicas of a LeaderWorkerSet.
A few minor observations below via inline comments.
| testutils "sigs.k8s.io/rbgs/test/utils" | ||
| wrappersv2 "sigs.k8s.io/rbgs/test/wrappers/v1alpha2" | ||
| ) | ||
|
|
There was a problem hiding this comment.
Stale comment: this says newTestImage returns a different image but it's a const, not a function. Trivial nit — just remove or fix the comment.
| } | ||
| } | ||
| return true | ||
| }, testutils.Timeout, testutils.Interval).Should(gomega.BeTrue()) |
There was a problem hiding this comment.
This is the test that times out in CI. With 2 replicas x size 2 = 4 pods, the Eventually block waits for all pods to retain UIDs and update images. Please check whether the controller actually completes the in-place update for all pods in this scenario, or if there's a reconciliation issue that leaves some pods stuck.
| b, err := json.Marshal(patch{Metadata: metadata{Labels: labels}}) | ||
| if err != nil { | ||
| return &runtime.RawExtension{} | ||
| } |
There was a problem hiding this comment.
Silently returning an empty RawExtension on marshal failure will produce confusing test results. Consider using gomega.Expect(err).ShouldNot(gomega.HaveOccurred()) or ginkgo.GinkgoT().Fatal(err) to surface the error.
| // Get initial pods (2 groups * 2 pods each = 4 pods) | ||
| podList := &corev1.PodList{} | ||
| gomega.Eventually(func() bool { | ||
| err := f.Client.List(f.Ctx, podList, client.InNamespace(f.Namespace), |
There was a problem hiding this comment.
This test timed out in CI after 150s. The CI logs show that the in-place image update succeeds (SuccessfulUpdatePodInPlace), but then the default RecreateInstanceOnPodRestart restart policy triggers pod deletion and recreation with new UIDs. The assertion that all pod UIDs remain unchanged then never passes.
You likely need to either:
- Set a restart policy on the LeaderWorker role that does not recreate instances on container restart, or
- Adjust the assertion to account for the fact that in-place updates combined with
RecreateInstanceOnPodRestartwill produce new pod UIDs.
The same issue may affect the single-replica leaderWorker inplace tests that currently pass - they might be passing only due to timing differences with fewer pods.
| "rbg.workloads.x-k8s.io/role-name": "role-1", | ||
| }) | ||
| return err == nil && len(podList.Items) == 3 | ||
| }, testutils.Timeout, testutils.Interval).Should(gomega.BeTrue()) |
There was a problem hiding this comment.
The test is titled leaderWorker pattern with InPlaceIfPossible should recreate pods when updating leader/worker patches, but the test body only updates the base template image and then asserts that all pod UIDs remain the same (lines 505-514) - i.e., it verifies in-place update, not recreation.
If the intent is to test recreation when patches change, the test should modify LeaderTemplatePatch or WorkerTemplatePatch with non-image fields and assert UIDs change. If the intent is to verify that in-place update still works when leader/worker patches are present, rename the test to match that behavior.
| rbg.Spec.Roles[0].StandalonePattern.Template.Spec.Containers[0].Image = updatedImage | ||
| rbg.Spec.Roles[0].StandalonePattern.Template.Spec.Containers[0].Env = []corev1.EnvVar{ | ||
| {Name: "TEST_ENV", Value: "test"}, | ||
| } |
There was a problem hiding this comment.
Both grace period tests (standalone at line 289 and leaderWorker at line 359) set GracePeriodSeconds: 5 but only verify the final state - image updated and pod UID unchanged. They never assert that the update was delayed by the grace period.
Consider adding a Consistently check right after the update to confirm the pod image has NOT changed yet during the grace window, before the Eventually check that confirms the update eventually completes. Without this, these tests are functionally identical to the basic inplace update tests.
| gomega.Expect(f.Client.Create(f.Ctx, rbg)).Should(gomega.Succeed()) | ||
| f.ExpectRbgV2Equal(rbg) | ||
|
|
||
| // Get initial pod |
There was a problem hiding this comment.
buildRawExtensionPatch here is essentially a duplicate of buildRawPatch in test/wrappers/v1alpha2/role_wrapper.go. Consider reusing the existing helper or extracting both into a shared test utility package.
| testutils "sigs.k8s.io/rbgs/test/utils" | ||
| wrappersv2 "sigs.k8s.io/rbgs/test/wrappers/v1alpha2" | ||
| ) | ||
|
|
There was a problem hiding this comment.
The comment // newTestImage returns a different image for testing updates reads like a function doc comment, but decorates a const declaration. Something like // updatedImage is a different image tag used for in-place update test scenarios would be clearer.
Ⅰ. Motivation
Ⅱ. Modifications
Ⅲ. Does this pull request fix one issue?
fixes #XXXX
Ⅳ. List the added test cases (unit test/integration test) if any, please explain if no tests are needed.
Ⅴ. Describe how to verify it
VI. Special notes for reviews
Checklist
make fmt.