Skip to content

KEP 5554: In place update pod resources alongside static cpu manager policy KEP creation#5555

Merged
k8s-ci-robot merged 1 commit intokubernetes:masterfrom
esotsal:ippvs-alognside-static-cpu-policy-KEP
Feb 12, 2026
Merged

KEP 5554: In place update pod resources alongside static cpu manager policy KEP creation#5555
k8s-ci-robot merged 1 commit intokubernetes:masterfrom
esotsal:ippvs-alognside-static-cpu-policy-KEP

Conversation

@esotsal
Copy link
Copy Markdown
Contributor

@esotsal esotsal commented Sep 21, 2025

  • One-line PR description: Create new KEP 5554: In place update pod resources alongside static cpu manager policy
  • Other comments:

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/node Categorizes an issue or PR as relevant to SIG Node. labels Sep 21, 2025
@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Sep 21, 2025
@esotsal esotsal force-pushed the ippvs-alognside-static-cpu-policy-KEP branch from 4c5c393 to 1240d58 Compare September 21, 2025 18:24
@esotsal esotsal changed the title KEP 5554: In place update pod resources alongside static cpu manager policy KEP 5554: In place update pod resources alongside static cpu manager policy KEP creation Sep 21, 2025
@esotsal esotsal changed the title KEP 5554: In place update pod resources alongside static cpu manager policy KEP creation [WIP] KEP 5554: In place update pod resources alongside static cpu manager policy KEP creation Sep 21, 2025
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 21, 2025
@esotsal
Copy link
Copy Markdown
Contributor Author

esotsal commented Sep 21, 2025

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

@esotsal: GitHub didn't allow me to request PR reviews from the following users: Chunxia202410.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

Details

In response to this:

/cc @natasha41575 @tallclair @pravk03 @Chunxia202410 @ffromani

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@esotsal esotsal force-pushed the ippvs-alognside-static-cpu-policy-KEP branch from 1240d58 to 8973b16 Compare September 26, 2025 10:59
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Sep 26, 2025
@esotsal esotsal force-pushed the ippvs-alognside-static-cpu-policy-KEP branch 10 times, most recently from 24bfb5c to a6c437b Compare September 26, 2025 17:19
@esotsal
Copy link
Copy Markdown
Contributor Author

esotsal commented Jan 29, 2026

Is there any chance of concurrent allocations which may mutate the node state while a resize is InProgress, therefore mutating the state which made the kubelet think the resize was feasible, in such a way to make it no longer feasible?
I guess this is the key concern here (and I'm not into much IPPR enough to have an obvious immediate answer, even though the locking I see in the allocation manager should prevent that. But once in a while issue like kubernetes/kubernetes#136021 (comment) pop out and make me pause)

Thanks for sharing this issue @ffromani .

I think the most appropriate answer, is i know that i don't know :-(

@esotsal
Copy link
Copy Markdown
Contributor Author

esotsal commented Jan 29, 2026

ran out of time today, I mostly looked at things at a high level today, and the details about the promised CPUs make sense to me overall, but I want to do a deeper review of that part tomorrow

Thanks for your time, please check the new KEP updates, in preparation for v1.36. It includes the ContainerCPUs checkpoint suggestion from Francesco.

@esotsal
Copy link
Copy Markdown
Contributor Author

esotsal commented Jan 30, 2026

Status update for PRR reviewer , upcoming PRR freeze on Wednesday 4th February 2026 (AoE) / Thursday 5th February 2026, 12:00 UTC.

( last update 3rd February 2026 )

Answered below open comments

Remaining open comments, working on , not blocking for alpha can be fine tuned in beta :

For remaining unresolved comments, it is upon to the reviewers to decide if provided answers are sufficient or blocking for this KEP to go alpha in v1.36.

I think most important ones are below :

Please let me know, if I've missed a comment.

Thanks in advance!

Copy link
Copy Markdown
Contributor

@natasha41575 natasha41575 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see my previous comments have been addressed - thanks!


When the topology manager cannot generate a topology hint which satisfies the topology manager policy, the pod resize is marked as [Deferred](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/1287-in-place-update-pod-resources#resize-status). This means that while the node theoretically has enough CPU for the resize, it's not currently possible but can be re-evaluated later, potentially becoming feasible as pod churn frees up capacity.

Reasons for failure
Copy link
Copy Markdown
Contributor

@natasha41575 natasha41575 Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(replying to @ffromani's comment on this thread: #5555 (comment))

I want us to be very careful what we set as Deferred, and err on the side of marking things Infeasible. We are planning Scheduler Preemption for IPPR in 1.37.

Scheduler preemption will be triggered on all Deferred resizes. Meaning that the scheduler logic will try to find pods to preempt based on priority class and the size of the pod. Because the scheduler is not NUMA-aware, it is only safe to mark a resize as "Deferred" if the kubelet knows that scheduler-triggered preemption can help the resize to succeed. Otherwise the scheduler will be preempting pods unnecessarily.

In your example scenario I can see it could be possible to do this, but do you think we can reliably implement this kind of logic? It seems both complex and fragile - it would require the kubelet to make a lot of assumptions of the scheduler's behavior and I'm not sure that's a direction we want to go.

My opinion is marking it as "Infeasible" is a safe step forward to unblock us now, while still leaving room to relax it to "Deferred" later if necessary.

Copy link
Copy Markdown
Contributor

@ffromani ffromani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a pass and the KEP content LGTM. I have no major comments about the plan outlined with the KEP and, of course, I fully agree with the goal. I see a future extension path to incorporate the memory manager (and the device manager maybe?).
I think there's room to polish a bit the implementation details section, but it's pretty minor.

@KevinTMtz
Copy link
Copy Markdown
Member

KevinTMtz commented Feb 4, 2026

AFAIK there's no solution here. The state file change are sadly incompatible with each other, there's no justification in either KEP to carry other KEP's state fields, and there's little chance to shoehorn the fields needed by either KEP (if we ever want to do that, and we should not) into the fields added by the other.

The project has a policy/pattern against partial merge, like for example we cannot merge state file change for either/both KEPs first and then move on. I'm afraid the only real option is to find a way to serialize both KEPs somehow in the cycle.

If we both target the next cycle, the new fields from our features will have to be added to the new CPU manager V3 state in the same release, thus creating a dependency on the serialization changes of whichever feature is merged first.

I wonder how the process would be if the first merged feature is rolled back, but the second one will be maintained. At least we could keep any non-feature related changes that belong to the V3 state for the second feature to use. The path may be easier since we are not modifying the state in the same way, PodLevelResourceManagers is adding a new property, while InPlacePodVerticalScalingExclusiveCPUS is modifying one of the existing properties.

@ffromani @esotsal

@esotsal
Copy link
Copy Markdown
Contributor Author

esotsal commented Feb 5, 2026

Thanks for sharing your thoughts @KevinTMtz , I think likewise.

If we both target the next cycle, the new fields from our features will have to be added to the new CPU manager V3 state in the same release, thus creating a dependency on the serialization changes of whichever feature is merged first.

I think releasing in v1.36 is wanted position for both, according to SIG Node v1.36 KEPs planning both are Considered for release, KEP 5554 with High priority and KEP 5526 with Medium priority.

The path may be easier since we are not modifying the state in the same way, PodLevelResourceManagers is adding a new property, while InPlacePodVerticalScalingExclusiveCPUS is modifying one of the existing properties.

I agree, based on above I think it is manageable and doable to add them both in v1.36 release. I don't have a preference on the merging order, both works for me, up to sig-node community, revieweres and approvers to decide what is most suited.

Copy link
Copy Markdown
Contributor

@natasha41575 natasha41575 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ippr-specific bits LGTM

the rest of the content also LGTM, but admittedly I am not an expert in topology / cpu manager

/assign @ffromani @tallclair

@ffromani
Copy link
Copy Markdown
Contributor

ffromani commented Feb 6, 2026

ippr-specific bits LGTM

the rest of the content also LGTM, but admittedly I am not an expert in topology / cpu manager

This is an interesting part because I kinda feel the same in reverse. cpu/topology manager bits make sense but I can't really comment the IPPR integration.
Let's try to get an holistic vision and connect the dots.
The cpumanager part is about basically providing the minimally different hint which allows the request. On downsize seems trivial, an upsize it may cause a hint to require more NUMA nodes than the original allocation.
Then we defer to topology manager to accept or reject the resize considering the policy, much like admission.
This means either:

  1. we re-run an admission-like flow on resize, at least the TM part
  2. we have a new flow similar to admission in the resize path

Is this a correct 10k-feet summarization of the flow?

@natasha41575
Copy link
Copy Markdown
Contributor

natasha41575 commented Feb 6, 2026

ippr-specific bits LGTM
the rest of the content also LGTM, but admittedly I am not an expert in topology / cpu manager

This is an interesting part because I kinda feel the same in reverse. cpu/topology manager bits make sense but I can't really comment the IPPR integration. Let's try to get an holistic vision and connect the dots. The cpumanager part is about basically providing the minimally different hint which allows the request. On downsize seems trivial, un upsize it may cause a hint to require more NUMA nodes than the original allocation. Then we defer to topology manager to accept or reject the resize considering the policy, much like admission. This means either:

  1. we re-run an admission-like flow on resize, at least the TM part
  2. we have a new flow similar to admission in the resize path

Is this a correct 10k-feet summarization of the flow?

We actually already run admission checks on resize. So the ideal flow I think would be to integrate TM feasibility checks (i.e. can TM generate a hint?) into the "admission" path, and integrate TM allocation of CPUs into the "resize actuation" path which happens during a pod sync.

As an aside, I actually think kubernetes/kubernetes#133427 could help simplify the implementation of this KEP, because it unifies the existing resize feasibility checks with admission, allowing for different checks depending on whether we are adding or resizing a pod. So I can try to get this one in. IIUC TM already has its own admission handler, so we'd just need to make sure it does the right checks on resize.

Does this make sense?

Let's try to get an holistic vision and connect the dots.

I think @tallclair has a holistic understanding of both sides, so his review will help tremendously.

@esotsal
Copy link
Copy Markdown
Contributor Author

esotsal commented Feb 9, 2026

@dchen1107 , @tallclair , @natasha41575 , @ffromani , @deads2k , @KevinTMtz , @pravk03 are there any blocking items and/or open action point(s) for this KEP, which I might have missed for alpha PRR ? If so please let me know, enhancement freeze is tomorrow so would appreciate your feedback.

Adding @whtssub ( this KEPs wrangler ) who has kindly updated KEPs status #5554 (comment)

@Chunxia202410
Copy link
Copy Markdown

Open question

Are there any objections extended section contributed by @Chunxia202410 for kubernetes/kubernetes#131309 , discussion and decision if it will be a future extension or a separate KEP to be taken in beta. Reasoning is that the proposal is a non-blocking extension of KEP 5554 so it should not block this KEP going to alpha.

Hi @esotsal , thank you for any suggestions from you and the community regarding this issue. Since this feature is quite independent, we plan to address this part as a separate KEP. Thank you.

@ffromani
Copy link
Copy Markdown
Contributor

ippr-specific bits LGTM
the rest of the content also LGTM, but admittedly I am not an expert in topology / cpu manager

This is an interesting part because I kinda feel the same in reverse. cpu/topology manager bits make sense but I can't really comment the IPPR integration. Let's try to get an holistic vision and connect the dots. The cpumanager part is about basically providing the minimally different hint which allows the request. On downsize seems trivial, un upsize it may cause a hint to require more NUMA nodes than the original allocation. Then we defer to topology manager to accept or reject the resize considering the policy, much like admission. This means either:

  1. we re-run an admission-like flow on resize, at least the TM part
  2. we have a new flow similar to admission in the resize path

Is this a correct 10k-feet summarization of the flow?

We actually already run admission checks on resize. So the ideal flow I think would be to integrate TM feasibility checks (i.e. can TM generate a hint?) into the "admission" path, and integrate TM allocation of CPUs into the "resize actuation" path which happens during a pod sync.

As an aside, I actually think kubernetes/kubernetes#133427 could help simplify the implementation of this KEP, because it unifies the existing resize feasibility checks with admission, allowing for different checks depending on whether we are adding or resizing a pod. So I can try to get this one in. IIUC TM already has its own admission handler, so we'd just need to make sure it does the right checks on resize.

Does this make sense?

It does, thanks for clairifying!

Let's try to get an holistic vision and connect the dots.

I think @tallclair has a holistic understanding of both sides, so his review will help tremendously.

+1!!

@ffromani
Copy link
Copy Markdown
Contributor

@dchen1107 , @tallclair , @natasha41575 , @ffromani , @deads2k , @KevinTMtz , @pravk03 are there any blocking items and/or open action point(s) for this KEP, which I might have missed for alpha PRR ? If so please let me know, enhancement freeze is tomorrow so would appreciate your feedback.

Adding @whtssub ( this KEPs wrangler ) who has kindly updated KEPs status #5554 (comment)

LGTM from my side!

@deads2k
Copy link
Copy Markdown
Contributor

deads2k commented Feb 10, 2026

PRR looks good for alpha. I made a few comments about things we'll need to be sure we refine in beta.

/approve

@esotsal
Copy link
Copy Markdown
Contributor Author

esotsal commented Feb 11, 2026

PRR looks good for alpha. I made a few comments about things we'll need to be sure we refine in beta.

Thanks updated this PR, to resolve those comments.

@natasha41575
Copy link
Copy Markdown
Contributor

/lgtm

@esotsal
Copy link
Copy Markdown
Contributor Author

esotsal commented Feb 11, 2026

/lgtm

Thanks, updated KEP to resolve alongside typo ( diff ) . Please take another look.

@natasha41575
Copy link
Copy Markdown
Contributor

/lgtm

@tallclair
Copy link
Copy Markdown
Member

/lgtm
/approve

Decision to defer to TopologyManager for which CPUs to downsize LGTM. I'm not sure about the decision to forbid resizing below the initial count, but that is something we can easily revisit at a later date if there's a use case for it.
I only gave the implementation plan a superficial review, but we can work out the details in the actual implementation PR.


To effectively address the needs of both users and Kubernetes components for the realization of this KEP, the proposed implementation involves the following changes:

1. Update the `CPUManager` checkpoint file format as stated in [ContainerCPUs checkpoint](#containercpus-checkpoint) section), which will serve as the single source of truth to represent the original and resized exclusive CPUs of an in place CPU resize of a Guaranteed Pod with CPU Static Policy.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the new format be feature gated?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If feature gate is not set , ”resize” will be always empty, only original will be used. I haven’t thought making the new format feature gated , Any use case you have in mind ? Is it ok to continue discussion in the implementation PR?

Copy link
Copy Markdown
Contributor Author

@esotsal esotsal Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the new format be feature gated?

Such a simple question, i was not aware how important it was and the complexity to solve it. Thanks @tallclair for the question, considering v1.36 cycle reviews in this KEPs PR , short answer , it was missed and , yes, it must be feature gated as well as ALL code modifications. Why? To ensure k8s operation activities will not be impacted ( rollback, harmonized co-existence with other features touching checkpoint, ensuring v1.PorReasonInfeasible is returned when needed to reduce impacts on a node with unecessary resizes etc ). #5965 created to update the KEP with the modifications hoping we get to a consensus increasing confidence that most if not ALL risks have been considered and KEP can be 'alpha' in v1.37.

@dchen1107
Copy link
Copy Markdown
Member

/lgtm
/approve based on @tallclair and @ffromani's review.

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dchen1107, deads2k, esotsal, tallclair

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/node Categorizes an issue or PR as relevant to SIG Node. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.