diff --git a/keps/sig-network/5709-pod-network-readiness-gates/README.md b/keps/sig-network/5709-pod-network-readiness-gates/README.md new file mode 100644 index 000000000000..a78220c234a2 --- /dev/null +++ b/keps/sig-network/5709-pod-network-readiness-gates/README.md @@ -0,0 +1,1297 @@ + +# KEP-5709: Add a well-known pod network readiness gate + + + + + + +- [Release Signoff Checklist](#release-signoff-checklist) +- [Summary](#summary) +- [Motivation](#motivation) + - [Goals](#goals) + - [Non-Goals](#non-goals) +- [Proposal](#proposal) + - [Condition type](#condition-type) + - [Approach A: Network plugin webhook (no core changes)](#approach-a-network-plugin-webhook-no-core-changes) + - [Approach B: Extend kubelet readiness logic (kubelet change)](#approach-b-extend-kubelet-readiness-logic-kubelet-change) + - [Approach C: API server injects built-in readiness gate (API server change)](#approach-c-api-server-injects-built-in-readiness-gate-api-server-change) + - [Definition of "network ready"](#definition-of-network-ready) + - [User Stories](#user-stories) + - [Story 1: Preventing traffic black-holes during pod startup](#story-1-preventing-traffic-black-holes-during-pod-startup) + - [Story 2: NetworkPolicy enforcement before traffic arrives](#story-2-networkpolicy-enforcement-before-traffic-arrives) + - [Story 3: Pods with multiple network devices via DRA](#story-3-pods-with-multiple-network-devices-via-dra) + - [Notes/Constraints/Caveats](#notesconstraintscaveats) + - [Risks and Mitigations](#risks-and-mitigations) +- [Design Details](#design-details) + - [Webhook behaviour](#webhook-behaviour) + - [Node-agent PATCH flow](#node-agent-patch-flow) + - [RBAC requirements](#rbac-requirements) + - [Interaction with existing conditions](#interaction-with-existing-conditions) + - [Worked example](#worked-example) + - [Test Plan](#test-plan) + - [Prerequisite testing updates](#prerequisite-testing-updates) + - [Unit tests](#unit-tests) + - [Integration tests](#integration-tests) + - [e2e tests](#e2e-tests) + - [Graduation Criteria](#graduation-criteria) + - [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy) + - [Version Skew Strategy](#version-skew-strategy) +- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire) + - [Feature Enablement and Rollback](#feature-enablement-and-rollback) + - [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning) + - [Monitoring Requirements](#monitoring-requirements) + - [Dependencies](#dependencies) + - [Scalability](#scalability) + - [Troubleshooting](#troubleshooting) +- [Implementation History](#implementation-history) +- [Drawbacks](#drawbacks) +- [Alternatives](#alternatives) +- [Infrastructure Needed (Optional)](#infrastructure-needed-optional) + + +## Release Signoff Checklist + + + +Items marked with (R) are required *prior to targeting to a milestone / release*. + +- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR) +- [ ] (R) KEP approvers have approved the KEP status as `implementable` +- [ ] (R) Design details are appropriately documented +- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors) + - [ ] e2e Tests for all Beta API Operations (endpoints) + - [ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) + - [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free +- [ ] (R) Graduation criteria is in place + - [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) within one minor version of promotion to GA +- [ ] (R) Production readiness review completed +- [ ] (R) Production readiness review approved +- [ ] "Implementation History" section is up-to-date for milestone +- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io] +- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes + + + +[kubernetes.io]: https://kubernetes.io/ +[kubernetes/enhancements]: https://git.k8s.io/enhancements +[kubernetes/kubernetes]: https://git.k8s.io/kubernetes +[kubernetes/website]: https://git.k8s.io/website + +## Summary + +Kubernetes currently has no explicit signal for whether a pod's +network has been fully programmed and is ready to receive traffic. +The closest existing condition, [`PodReadyToStartContainers`][KEP-3085], +indicates that the pod sandbox has been created and CNI `ADD` has +returned — but not that the network datapath is fully programmed. +This KEP introduces a built-in [pod readiness gate][KEP-580] +condition that the network plugin sets to indicate network readiness, +cleanly separating application readiness (answered by readiness probes) from +network readiness (answered by the network plugin). This becomes +especially important as [KEP-4559] moves kubelet probes to run +inside the pod network namespace, removing the implicit network +reachability signal that today's over-the-network probes +accidentally provide. + +[KEP-3085]: https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/3085-pod-conditions-for-starting-completition-of-sandbox-creation/README.md +[KEP-4559]: https://github.com/kubernetes/enhancements/issues/4559 +[KEP-580]: https://github.com/kubernetes/enhancements/blob/master/keps/sig-network/580-pod-readiness-gates/README.md + +## Motivation + +Today, kubelet readiness probes are executed from the host network +namespace, sending traffic over the pod network to reach the pod. +A side effect of this design is that a successful readiness probe +implicitly confirms that the network plugin has assigned an IP and +programmed the necessary routes and rules — i.e., that the pod is +reachable over the network. However, this was never the intended +purpose of readiness probes; they exist to answer "is the +application ready to serve traffic?", not "is the network path +functional?". The two concerns have been conflated by accident of +implementation. + +This implicit signal is also unreliable. Even today, a readiness +probe can succeed before NetworkPolicy rules are fully programmed, +as [documented in the Kubernetes docs][np-pod-lifecycle]. The CNI +plugin's `ADD` call typically returns before the pod is fully +plumbed into the network, because blocking on full network +programming would serialize pod creation and significantly degrade +bulk pod startup performance. The existing +[`PodReadyToStartContainers`][KEP-3085] condition (originally named +`PodHasNetwork`, renamed because the old name was misleading) +captures the moment CNI `ADD` returns and the sandbox is ready — +but this does not mean the network datapath is fully programmed +(e.g., OVS flows installed, nftables rules in place, routes +propagated to remote nodes). The readiness probe happens to paper +over this gap most of the time, but not always. + +[KEP-4559] makes this problem explicit. It proposes moving TCP, +HTTP, and gRPC probes to run inside the pod network namespace using +CRI `PortForward()`, connecting to `localhost` rather than the pod +IP. This solves critical security and architectural problems (the +blind SSRF attack, the NetworkPolicy hole that exempts kubelet +probes, and constraints on network architectures with overlapping +pod IPs), but it means the probe no longer traverses the pod +network at all. Without a replacement signal, the following failure +scenario becomes possible: + +1. A pod starts and its application begins listening on a port. +2. The new localhost-based readiness probe succeeds. +3. The pod is marked `Ready` and added to Service endpoints. +4. But the CNI plugin has not yet finished programming the network. +5. Traffic is routed to the pod and fails because it is not yet + reachable. + +Rather than trying to preserve the accidental coupling between +probes and network reachability, this KEP proposes making network +readiness an explicit, first-class signal. Kubernetes already has a +mechanism for external controllers to participate in pod readiness +decisions: [pod readiness gates][KEP-580]. Network plugins can use +this mechanism to indicate when a pod's network is fully +programmed. The readiness probe then only needs to answer "is the +application processing connections?", while the network readiness +gate answers "can traffic reach this pod?" — a clean separation +that is more accurate and reliable than the implicit signal we +depend on today. + +[np-pod-lifecycle]: https://kubernetes.io/docs/concepts/services-networking/network-policies/#pod-lifecycle + +### Goals + +- Define a well-known pod readiness gate condition type that network + plugins set to signal that a pod's network datapath is fully + programmed and the pod is reachable from other pods. +- Scope is CNI-networked pods, where the network plugin must program + the datapath after the sandbox is created. +- Provide a standard convention that all network plugins can adopt, + so that the ecosystem converges on a single condition type rather + than each plugin inventing its own. +- Cleanly separate application readiness (answered by readiness + probes) from network readiness (answered by the network plugin), + making the overall readiness model explicit rather than relying on + the accidental coupling between over-the-network probes and + network programming. + +### Non-Goals + +- Implementing this readiness gate in any specific network plugin + (OVN-Kubernetes, Cilium, Calico, etc.). This KEP defines the convention; + adoption is up to each network plugin. +- Modifying kubelet's built-in readiness evaluation logic. The + proposal builds on the existing [pod readiness gates][KEP-580] + mechanism without changing how kubelet computes the `Ready` + condition. +- Distributed multi-node network probing (e.g., verifying that a pod + is reachable from every other node in the cluster). The scope is + limited to the network plugin on the local node signaling that it + has finished programming the datapath. +- Host-network pods (`hostNetwork: true`). These pods use the node's + existing network namespace; there is no plugin-managed datapath to + program. +- Replacing or changing existing readiness probes. Readiness probes + continue to serve their current purpose of indicating application + readiness; this KEP adds a complementary signal for network + readiness. + +## Proposal + +This KEP defines a well-known pod condition type that network plugins +set to indicate that a pod's network datapath is fully programmed and +the pod is reachable from other pods. All three approaches below +share two things: + +- **Signaling.** The network plugin's node agent PATCHes + `status.conditions` to set the condition to `True` when the + datapath is ready. + +The approaches differ in how the readiness gate enters +`spec.readinessGates` — which is the prerequisite for kubelet to +account for the condition when computing the pod's `Ready` status. + +<<[UNRESOLVED which-approach ]>> + +Reviewers: please comment on which approach you prefer. After +agreement, the chosen approach becomes the proposal and the other +two move to Alternatives. + +<<[/UNRESOLVED]>> + +### Condition type + +<<[UNRESOLVED condition-type-naming ]>> + +Three naming options are under consideration: + +1. **`/network-ready`** (e.g., `cilium.io/network-ready`, + `ovn-kubernetes.io/network-ready`) — each network plugin uses its + own domain prefix. Ownership is unambiguous, and in multi-plugin + clusters (e.g., Multus + DRA) each plugin can independently signal + its own condition. Follows the custom pod condition naming + convention described in [KEP-580]. Trade-off: there is no single + condition name that operators and tooling can rely on across + clusters, and Approach B (kubelet hardcodes the condition) would + not work with per-plugin names. + +2. **`networking.kubernetes.io/pod-network-ready`** — a single + domain-qualified name under the `networking.kubernetes.io` prefix. + Gives operators and tooling a consistent name to query across any + cluster while still following the [KEP-580] naming convention. + Trade-off: in multi-plugin clusters only one plugin can own the + condition, or the plugins must coordinate who sets it. + +3. **`PodNetworkReady`** — a short, unqualified name that mirrors + the style of built-in conditions like `PodReadyToStartContainers` + and `ContainersReady`. Same single-name benefits as option 2, but + the unqualified form signals this is a core ecosystem convention + rather than an extension. Same multi-plugin trade-off applies. + +For the remainder of this KEP the placeholder `` is used +wherever the condition type appears. Reviewers: please comment on which +naming option you prefer. + +<<[/UNRESOLVED]>> + +### Approach A: Network plugin webhook (no core changes) + +The network plugin deploys a mutating admission webhook that +intercepts pod `CREATE` requests and appends `` to +`spec.readinessGates`. Because readiness gates are immutable after +creation ([KEP-580]), the webhook must fire at pod creation time. +If the pod spec already contains the readiness gate (for example, +added by the user or a higher-level controller), the webhook is a +no-op. The plugin's node agent then PATCHes `status.conditions` to +set the condition to `True` when the datapath is ready. + +- **Pro:** Follows [KEP-580]'s design exactly; no core Kubernetes + changes required. +- **Con:** Every network plugin must independently implement the + webhook. +- **Con:** The webhook sits in the pod creation path, adding + latency to every pod create. +- **Con:** If the webhook is unavailable and `failurePolicy` is + `Ignore`, pods are created without the gate and silently lose + protection. + +### Approach B: Kubelet natively checks a well-known condition (kubelet change) + +Kubelet is modified to natively factor the well-known +`` condition from `status.conditions` into its `Ready` +computation — the same way it already factors in `ContainersReady`. +No readiness gate in `spec.readinessGates` is needed and no webhook +or spec mutation is involved. The network plugin only needs to PATCH +the status condition; kubelet does the rest. + +- **Pro:** No `spec` mutation involved — no readiness gates, no + webhook. The plugin only PATCHes a `status` condition, which is + the simplest possible contract for plugin authors. There is + precedent for this pattern: `ContainersReady` is already hardcoded + into kubelet's `Ready` formula without being a readiness gate, and + [KEP-3085] added `PodReadyToStartContainers` as another + kubelet-managed well-known condition. +- **Con:** Requires a kubelet code change, increasing scope. +- **Con:** Unlike `ContainersReady` (which kubelet itself sets), + `` would be set by an external agent — making + kubelet's `Ready` computation depend on an out-of-tree component + for the first time. + +### Approach C: API server injects built-in readiness gate (API server change) + +The API server automatically injects `` into +`spec.readinessGates` for every pod at creation time, making the +readiness gate built-in. The network plugin only needs to PATCH the +status condition to `True` when the datapath is ready. + +- **Pro:** Every pod gets the readiness gate automatically, + with no webhook needed, so no latency during pod create. +- **Con:** First-ever built-in readiness gate; steers away from + [KEP-580]'s original design, which explicitly delegated readiness + gate injection to external controllers via webhooks. +- **Con:** Backward-compatibility risk — if a network plugin does + not set the condition, pods are stuck not-Ready forever. Requires + a feature gate and careful rollout. + +### Definition of "network ready" + +What constitutes "network ready" is intentionally left to the network +plugin, because the details vary by implementation. As a general +guideline, the condition should be set to `True` when the pod's +datapath is fully programmed and the pod can receive traffic from +other pods in the cluster. Examples of what a plugin might wait for +include OVS flows or eBPF programs installed, nftables/iptables +rules for NetworkPolicy applied, or routes propagated to remote +nodes. The plugin SHOULD NOT wait for external reachability from +outside the cluster (e.g., Ingress or cloud load balancers), as those +are separate concerns with their own readiness signals. + +### User Stories + +#### Story 1: Preventing traffic black-holes during pod startup + +A platform team runs a large cluster with an overlay network plugin. +They observe occasional HTTP 5xx errors immediately after a +Deployment rolls out new pods, because the pods are marked `Ready` +and added to Service endpoints before the network plugin has finished +programming routes on remote nodes. After the network plugin adopts +this KEP's readiness gate, new pods are held out of endpoints until +the plugin confirms network readiness, eliminating the transient +errors. + +#### Story 2: NetworkPolicy enforcement before traffic arrives + +A compliance team requires that no traffic reach a pod before its +NetworkPolicy rules are fully programmed. Today, there is a +[documented race][np-pod-lifecycle] where a pod can receive traffic +before the network plugin finishes installing policy rules. With +this readiness gate, the network plugin defers setting the condition +to `True` until both the datapath and all applicable NetworkPolicy +rules are in place, closing this gap. + +#### Story 3: Pods with multiple network devices via DRA + +An HPC team uses Dynamic Resource Allocation (DRA) to attach +multiple network devices to a single pod — for example, a primary +cluster network interface plus a high-speed RDMA interface. Each +device may be programmed by a different plugin or driver, and each +has its own readiness timeline. The pod should not receive traffic +until all its network devices are fully plumbed. The network plugin +waits for every attached device to be ready before setting the +condition to `True`, giving the team a single, unified signal that +the pod's entire network stack is functional. + +### Notes/Constraints/Caveats + +- **Added latency to `Ready`.** By design, the pod's `Ready` + condition will not become `True` until the network plugin confirms + the datapath is programmed. This adds time to the pod startup + path relative to today's behaviour. The trade-off is intentional — + a pod that is `Ready` before its network is functional causes + worse problems (traffic black-holes, 5xx errors) than a pod that + takes slightly longer to become `Ready`. + +- **Interaction with `PodReadyToStartContainers`.** The + `PodReadyToStartContainers` condition (from [KEP-3085]) indicates + that the sandbox is created and CNI `ADD` has returned. The + `` condition is a strictly later signal — it indicates + that the full datapath is programmed. Both conditions can coexist; + they answer different questions. + +- **Multiple network plugins.** In clusters with more than one + network plugin (e.g., Multus + DRA), the behaviour depends on the + naming option chosen. With per-plugin names (option 1), each + plugin signals its own condition independently. With a single + well-known name (options 2 or 3), the plugins must coordinate who + sets the condition, or a meta-controller must aggregate their + signals. + +- **(Approach A only) Webhook availability.** If the mutating + webhook is unavailable and `failurePolicy` is `Ignore`, pods are + created without the readiness gate and silently lose protection. + If `failurePolicy` is `Fail`, pod creation is blocked while the + webhook is down. Plugin authors should choose the failure policy + that matches their users' risk tolerance. + +- **(Approach A only) Existing pods.** Readiness gates are immutable + after pod creation, so pods created before the webhook was deployed + will not have the readiness gate. This is expected and safe — those + pods continue to behave as they always have. + +### Risks and Mitigations + +| Risk | Applies to | Mitigation | +|------|-----------|------------| +| Plugin bug causes the condition to never be set to `True`, leaving pods stuck not-Ready. Especially severe for Approach C where every pod gets the gate automatically. | All | Plugin authors should implement a timeout or fallback. Operators can detect the issue by querying for pods where `ContainersReady` is `True` but `Ready` is `False` for an extended period. Approach C would additionally require a feature gate for safe rollout. | +| Extra API call (PATCH to pod status) per pod increases API server load. | All | The PATCH is a single, small write per pod startup — the same pattern already used by other controllers. Negligible compared to existing pod lifecycle writes. | +| Adoption is fragmented — some plugins adopt the convention, others don't, leading to inconsistent behaviour across clusters. | All | This KEP provides a clear, minimal convention. SIG Network can encourage adoption by listing compliant plugins in the KEP's implementation history, adding conformance tests, and documenting the convention on kubernetes.io. | +| RBAC misconfiguration prevents the plugin's node agent from PATCHing pod status. | All | Document the required RBAC rules (see Design Details). Plugin installation manifests should include the necessary ClusterRole / ClusterRoleBinding. | +| Webhook outage causes pods to be created without the readiness gate, silently losing protection. | A | Plugin authors should monitor webhook availability and alert on failures. Clusters that require the guarantee can use `failurePolicy: Fail`. | +| Webhook adds latency to the pod creation path. | A | The webhook performs a small, deterministic mutation (appending one entry to a list). Latency should be comparable to other mutating webhooks in the cluster. | +| Bug in kubelet readiness logic could affect all pods, not just those using network readiness. | B | The change should be small and well-tested. Kubelet already evaluates readiness gates; this adds one more condition to the same code path. | + +## Design Details + +### Webhook behaviour + +The network plugin deploys a `MutatingWebhookConfiguration` that +targets pod `CREATE` operations. The webhook appends the well-known +readiness gate to `spec.readinessGates`: + +```yaml +spec: + readinessGates: + - conditionType: "" +``` + +If the readiness gate is already present (injected by the user, a +Helm chart, or another controller), the webhook leaves the list +unchanged. The webhook should target pods that use CNI networking +(i.e., skip pods with `hostNetwork: true`). + +### Node-agent PATCH flow + +The plugin's node agent watches for pods on its node. When the agent +has finished programming the datapath for a pod, it issues a +strategic-merge PATCH against the pod's `/status` subresource: + +```http +PATCH /api/v1/namespaces//pods//status +Content-Type: application/strategic-merge-patch+json +``` + +```json +{ + "status": { + "conditions": [ + { + "type": "", + "status": "True", + "lastTransitionTime": "2025-07-01T12:00:00Z" + } + ] + } +} +``` + +### RBAC requirements + +The node agent needs permission to PATCH the `/status` subresource +of pods. A minimal ClusterRole looks like: + +```yaml +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRole +metadata: + name: network-readiness-agent +rules: +- apiGroups: [""] + resources: ["pods/status"] + verbs: ["patch"] +``` + +### Interaction with existing conditions + +The `` condition complements the conditions kubelet +already manages: + +| Condition | Set by | Meaning | +|-----------|--------|---------| +| `PodReadyToStartContainers` | kubelet | Sandbox created, CNI `ADD` returned | +| `ContainersReady` | kubelet | All containers passed their readiness probes (application is ready to serve) | +| `` | network plugin | Datapath fully programmed, pod is reachable from other pods | +| `Ready` | kubelet | All of the above are `True` (including all readiness gates) | + +For pods that back a Service, the `Ready` condition determines +whether the pod is added to endpoints. Both the application +(readiness probes / `ContainersReady`) and the network +(``) must be ready before traffic is routed to the +pod. + +The timeline during pod startup is: + +1. `PodReadyToStartContainers` becomes `True` — containers begin + starting. +2. Readiness probes begin running. When all containers pass, + kubelet sets `ContainersReady` to `True` — the application is + ready to serve traffic. However, `` does not yet + exist in `status.conditions`, so kubelet evaluates it as `False` + per [KEP-580] semantics. `Ready` remains `False`. +3. The network plugin sets `` to `True`. Kubelet + re-evaluates and sets `Ready` to `True`. +4. The endpoints controller adds the pod to the Service. Traffic + flows only after both the application and the network are ready. + +### Worked example + +Consider a cluster running a network plugin that adopts this +convention. A user creates the following Deployment: + +```yaml +apiVersion: apps/v1 +kind: Deployment +metadata: + name: web +spec: + replicas: 2 + selector: + matchLabels: + app: web + template: + metadata: + labels: + app: web + spec: + containers: + - name: web + image: registry.k8s.io/e2e-test-images/agnhost:2.43 + ports: + - containerPort: 80 + readinessProbe: + httpGet: + path: /healthz + port: 80 + periodSeconds: 5 +``` + +The user's manifest contains no mention of readiness gates — that +detail is handled transparently by the network plugin. After the +mutating webhook fires, the pod spec stored in etcd looks like: + +```yaml +spec: + readinessGates: + - conditionType: "" + containers: + - name: web + # ... same as above ... +``` + +Once the pod is fully started and the network plugin has signaled +readiness, the resulting pod status looks like: + +```yaml +status: + conditions: + - type: PodReadyToStartContainers + status: "True" + - type: ContainersReady + status: "True" + - type: "" + status: "True" + - type: Ready + status: "True" +``` + +### Test Plan + + + +[x] I/we understand the owners of the involved components may require updates to +existing tests to make this code solid enough prior to committing the changes necessary +to implement this enhancement. + +##### Prerequisite testing updates + + + +The existing readiness gate mechanism ([KEP-580]) is already well +tested in kubelet. The tests below focus on validating the specific +interaction pattern this KEP defines: a network-readiness condition +set by an external agent gating the pod's overall `Ready` status +and Service endpoint membership. + +##### Unit tests + + + + + +The packages touched depend on the chosen approach. Coverage data +will be collected before implementation begins. + +**Approach A (webhook):** No production code changes in +kubernetes/kubernetes. SIG Network will add mock-based unit tests +in k/k that simulate a network plugin injecting the readiness gate +and PATCHing the condition, to validate the convention works +correctly against kubelet's readiness evaluation logic. + +- `pkg/kubelet/status`: `` - `` + +**Approach B (kubelet change):** + +- `pkg/kubelet/status`: `` - `` + - Pod with `` condition absent remains not-Ready + even when `ContainersReady` is `True`. + - Pod with `` set to `True` and `ContainersReady` + `True` becomes `Ready`. + - Pod with `` set to `False` remains not-Ready. + - Feature-gate disabled: kubelet ignores `` and + computes `Ready` as before. + +**Approach C (API server change):** + +- `pkg/registry/core/pod`: `` - `` + - Readiness gate is injected into `spec.readinessGates` for new + non-host-network pods. + - Readiness gate is not injected for host-network pods. + - Existing pods without the gate are not affected on update. + - Feature-gate disabled: no readiness gate is injected. + +##### Integration tests + + + + + +Integration tests will verify the end-to-end readiness gate +lifecycle in a controlled environment: + +- Create a pod with the `` readiness gate in + `spec.readinessGates`. Verify that the pod's `Ready` condition + remains `False` even after all containers are running and passing + their readiness probes. +- PATCH the pod's `/status` subresource to set `` to + `True`. Verify that the pod's `Ready` condition transitions to + `True`. +- Create a Service selecting the pod. Verify that the pod's IP is + NOT present in the EndpointSlice while `` is + absent, and IS present after the condition is set to `True`. +- (Approach B/C) Verify feature-gate enable/disable behaviour: + with the gate disabled, pods should become `Ready` without + waiting for the `` condition. + +- [test name](https://github.com/kubernetes/kubernetes/blob/2334b8469e1983c525c0c6382125710093a25883/test/integration/...): [integration master](https://testgrid.k8s.io/sig-release-master-blocking#integration-master?include-filter-by-regex=MyCoolFeature), [triage search](https://storage.googleapis.com/k8s-triage/index.html?test=MyCoolFeature) + +##### e2e tests + + + +E2e tests will validate the pattern in a real cluster: + +- **Network readiness blocks endpoint membership:** Deploy a pod + behind a Service with a readiness gate for ``. + Verify the pod is not added to EndpointSlice until an external + agent sets the condition to `True`. Verify traffic reaches the + pod only after the condition is set. +- **Host-network pods are unaffected:** Deploy a host-network pod + and verify it becomes `Ready` without needing a `` + condition (the webhook should skip it, or the plugin should + immediately set the condition to `True`). +- **Rollout behaviour:** Perform a Deployment rolling update where + new pods have the readiness gate. Verify the rollout does not + proceed until each new pod has both `ContainersReady` and + `` set to `True`. + +- [test name](https://github.com/kubernetes/kubernetes/blob/2334b8469e1983c525c0c6382125710093a25883/test/e2e/...): [SIG ...](https://testgrid.k8s.io/sig-...?include-filter-by-regex=MyCoolFeature), [triage search](https://storage.googleapis.com/k8s-triage/index.html?test=MyCoolFeature) + +### Graduation Criteria + + + +### Upgrade / Downgrade Strategy + + + +### Version Skew Strategy + + + +## Production Readiness Review Questionnaire + + + +### Feature Enablement and Rollback + + + +###### How can this feature be enabled / disabled in a live cluster? + + + +- [ ] Feature gate (also fill in values in `kep.yaml`) + - Feature gate name: + - Components depending on the feature gate: +- [ ] Other + - Describe the mechanism: + - Will enabling / disabling the feature require downtime of the control + plane? + - Will enabling / disabling the feature require downtime or reprovisioning + of a node? + +###### Does enabling the feature change any default behavior? + + + +###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)? + + + +###### What happens if we reenable the feature if it was previously rolled back? + +###### Are there any tests for feature enablement/disablement? + + + +### Rollout, Upgrade and Rollback Planning + + + +###### How can a rollout or rollback fail? Can it impact already running workloads? + + + +###### What specific metrics should inform a rollback? + + + +###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested? + + + +###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.? + + + +### Monitoring Requirements + + + +###### How can an operator determine if the feature is in use by workloads? + + + +###### How can someone using this feature know that it is working for their instance? + + + +- [ ] Events + - Event Reason: +- [ ] API .status + - Condition name: + - Other field: +- [ ] Other (treat as last resort) + - Details: + +###### What are the reasonable SLOs (Service Level Objectives) for the enhancement? + + + +###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service? + + + +- [ ] Metrics + - Metric name: + - [Optional] Aggregation method: + - Components exposing the metric: +- [ ] Other (treat as last resort) + - Details: + +###### Are there any missing metrics that would be useful to have to improve observability of this feature? + + + +### Dependencies + + + +###### Does this feature depend on any specific services running in the cluster? + + + +### Scalability + + + +###### Will enabling / using this feature result in any new API calls? + + + +###### Will enabling / using this feature result in introducing new API types? + + + +###### Will enabling / using this feature result in any new calls to the cloud provider? + + + +###### Will enabling / using this feature result in increasing size or count of the existing API objects? + + + +###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs? + + + +###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components? + + + +###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)? + + + +### Troubleshooting + + + +###### How does this feature react if the API server and/or etcd is unavailable? + +###### What are other known failure modes? + + + +###### What steps should be taken if SLOs are not being met to determine the problem? + +## Implementation History + + + +## Drawbacks + + + +## Alternatives + + + +## Infrastructure Needed (Optional) + + diff --git a/keps/sig-network/5709-pod-network-readiness-gates/kep.yaml b/keps/sig-network/5709-pod-network-readiness-gates/kep.yaml new file mode 100644 index 000000000000..f6d448385ab4 --- /dev/null +++ b/keps/sig-network/5709-pod-network-readiness-gates/kep.yaml @@ -0,0 +1,37 @@ +title: Add a well-known pod network readiness gate +kep-number: 5709 +authors: + - "@tssurya" +owning-sig: sig-network +participating-sigs: + - sig-network +status: implementable +creation-date: 2026-04-05 +reviewers: + - "@danwinship" +approvers: + - "@danwinship" + +see-also: + - "/keps/sig-network/580-pod-readiness-gates" + +# The target maturity stage in the current dev cycle for this KEP. +stage: alpha + +latest-milestone: "v1.37" + +milestone: + alpha: "v1.37" + +# The following PRR answers are required at alpha release +# List the feature gate name and the components for which it must be enabled +feature-gates: + - name: MyFeature + components: + - kube-apiserver + - kube-controller-manager +disable-supported: true + +# The following PRR answers are required at beta release +metrics: + - my_feature_metric