FP-1.1: proposal to reduce runtime#5433
Conversation
Pull Request Functional Test Report for #5433 / 22e97edVirtual Devices
Hardware Devices
|
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request optimizes the power admin down/up tests to significantly reduce execution time on fully loaded chassis. By selecting a single eligible component for fabric, linecard, and controller card types, the test suite avoids redundant workflows while maintaining necessary validation coverage. The changes also introduce more robust checks for controller card roles and operational status to ensure reliable test execution. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request refactors the component selection logic in the power-admin-down-up test to improve robustness, specifically for fabric, linecard, and controller card components. The changes replace restrictive test-skipping patterns with more resilient discovery loops. The reviewer suggests further robustness improvements, including using gnmi.Lookup instead of gnmi.Get during discovery to handle missing telemetry gracefully, and replacing t.Fatalf with a logging-and-continue pattern when encountering unexpected redundant roles to prevent unnecessary test failures.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request refactors the power administration tests for fabric, linecard, and controller card components by moving eligibility checks out of the subtests to pre-select a single valid component for testing. Feedback was provided to improve the robustness of the controller card selection logic by using gnmi.Lookup instead of gnmi.Get for operational status to prevent potential fatal errors and to better handle scenarios with multiple secondary controllers.
| for _, c := range cs { | ||
| t.Run(c, func(t *testing.T) { | ||
| role := gnmi.Get(t, dut, gnmi.OC().Component(c).RedundantRole().State()) | ||
| if got, want := role, oc.Platform_ComponentRedundantRole_PRIMARY; got == want { | ||
| primary = c | ||
| t.Skipf("ControllerCard Component %s is PRIMARY, hence skipping", c) | ||
| } | ||
|
|
||
| oper := gnmi.Get(t, dut, gnmi.OC().Component(c).OperStatus().State()) | ||
| if got, want := oper, oc.PlatformTypes_COMPONENT_OPER_STATUS_ACTIVE; got != want { | ||
| t.Skipf("ControllerCard Component %s is already INACTIVE, hence skipping", c) | ||
| } | ||
|
|
||
| powerDownUp(t, dut, c, oc.PlatformTypes_OPENCONFIG_HARDWARE_COMPONENT_CONTROLLER_CARD, 20*time.Minute) | ||
| }) | ||
| role, ok := gnmi.Lookup(t, dut, gnmi.OC().Component(c).RedundantRole().State()).Val() | ||
| if !ok { | ||
| t.Logf("Controller card %q missing redundant-role telemetry; skipping during selection", c) | ||
| continue | ||
| } | ||
| switch role { | ||
| case oc.Platform_ComponentRedundantRole_PRIMARY: | ||
| primary = c | ||
| case oc.Platform_ComponentRedundantRole_SECONDARY: | ||
| secondary = c | ||
| default: | ||
| t.Logf("Controller card %q has unexpected redundant-role %v; skipping during selection", c, role) | ||
| continue | ||
| } | ||
| } | ||
| if primary != "" { | ||
| gnmi.Await(t, dut, gnmi.OC().Component(primary).SwitchoverReady().State(), 30*time.Minute, true) | ||
|
|
||
| if primary == "" || secondary == "" { | ||
| t.Skipf("Missing required controller roles: primary=%q secondary=%q", primary, secondary) | ||
| } | ||
|
|
||
| oper := gnmi.Get(t, dut, gnmi.OC().Component(secondary).OperStatus().State()) | ||
| if oper != oc.PlatformTypes_COMPONENT_OPER_STATUS_ACTIVE { | ||
| t.Skipf("Secondary controller %q not active: got %v", secondary, oper) | ||
| } |
There was a problem hiding this comment.
The selection logic for the controller card can be improved for robustness and consistency with the Fabric and Linecard tests. Currently, it uses gnmi.Get for OperStatus after the loop (lines 126-129), which will cause a fatal error if the leaf is missing. Additionally, if multiple secondary controllers exist, the current logic might pick an inactive one and skip the test even if an active one is available. It is better to check for an active secondary during the selection loop using gnmi.Lookup.
for _, c := range cs {
role, ok := gnmi.Lookup(t, dut, gnmi.OC().Component(c).RedundantRole().State()).Val()
if !ok {
t.Logf("Controller card %q missing redundant-role telemetry; skipping during selection", c)
continue
}
switch role {
case oc.Platform_ComponentRedundantRole_PRIMARY:
if primary == "" {
primary = c
}
case oc.Platform_ComponentRedundantRole_SECONDARY:
if secondary == "" {
oper, ok := gnmi.Lookup(t, dut, gnmi.OC().Component(c).OperStatus().State()).Val()
if ok && oper == oc.PlatformTypes_COMPONENT_OPER_STATUS_ACTIVE {
secondary = c
} else {
t.Logf("Controller card %q is SECONDARY but not ACTIVE (got %v); skipping during selection", c, oper)
}
}
default:
t.Logf("Controller card %q has unexpected redundant-role %v; skipping during selection", c, role)
}
}
if primary == "" || secondary == "" {
t.Skipf("Missing required active controller roles: primary=%q secondary=%q", primary, secondary)
}There was a problem hiding this comment.
multiple secondary controllers is an infeasible scenario, so it is not a priority here.
There was a problem hiding this comment.
multiple secondary controllers is an infeasible scenario, so it is not a priority here.
Limit power admin tests to one eligible component per type.
Summary
This change updates the power admin down/up tests to select and validate a single eligible component for each hardware type instead of running the workflow against every matching component.
What changed:
not empty
removable
OPER_STATUS_ACTIVE
TestLinecardPowerAdmin
◦ Apply the same selection logic for linecards.
◦ Skip if no eligible linecard is found.
TestControllerCardPowerAdmin
◦ Identify controller cards by redundant-role.
◦ Select the SECONDARY controller for the power cycle workflow.
◦ Skip if primary/secondary roles are not both present.
◦ Skip if the secondary controller is not active.
◦ Continue to await switchover-ready on the primary after the secondary recovers.
Why:
Running the power-admin workflow across every eligible fabric, linecard, or controller in a chassis increases runtime as discovered in a fully loaded chassis like https://partnerissuetracker.corp.google.com/issues/492974186. This change narrows the test to one representative eligible component per type while preserving the existing validation flow.
Behavioral impact
•Test scope is reduced from “all eligible components” to “one eligible component per type”.
•If no suitable component exists, the test now skips with a single top-level reason instead of creating multiple skipped subtests.