Skip to content

KEP-6003: Add KEP for configurable HPA sync period#6004

Open
Fedosin wants to merge 1 commit intokubernetes:masterfrom
Fedosin:configurable-hpa-sync-period
Open

KEP-6003: Add KEP for configurable HPA sync period#6004
Fedosin wants to merge 1 commit intokubernetes:masterfrom
Fedosin:configurable-hpa-sync-period

Conversation

@Fedosin
Copy link
Copy Markdown

@Fedosin Fedosin commented Apr 8, 2026

  • One-line PR description: Introduce KEP proposing an optional syncPeriodSeconds field on HorizontalPodAutoscalerBehavior to allow per-HPA override of the global --horizontal-pod-autoscaler-sync-period.

Introduce KEP proposing an optional syncPeriodSeconds field on
HorizontalPodAutoscalerBehavior to allow per-HPA override of the
global --horizontal-pod-autoscaler-sync-period.
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Fedosin
Once this PR has been reviewed and has the lgtm label, please assign towca for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Apr 8, 2026
per-item interval during reconciliation and cleans it up on HPA deletion.

The informer event handlers are updated so that:
- Newly created HPAs and spec changes (detected via `Generation` comparison)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HPA's generation fields are currently not set.
I have kubernetes/kubernetes#138228 to get this fixed

We propose to add a new field to the existing [`HorizontalPodAutoscalerBehavior`][] object:

- `syncPeriodSeconds`: *(int32)* the period in seconds between each
reconciliation of this HPA. Must be greater than 0 and less than or equal
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will allow this HPA to query more frequently the metric source, which is usually metrics-server, which caches its metrics. The end result is that in most situations reducing this sync period won't have much impact. Conceivably you could add a field that only targets custom/external metrics, although my suspicion is that will result in a non-intuitive API.

periods (e.g. 1s) for many HPAs, increasing the rate of metrics queries
and scale sub-resource calls. This is mitigated by:
- Validation bounds: `syncPeriodSeconds` must be >= 1 and <= 3600.
- Cluster administrators can use admission webhooks or policies to enforce
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't a meaningful webhook require to calculate the aggregate sync frequency across all HPAs in the cluster? For users, this looks difficult to reason about.

Do you see another way such a webhook could implemented? Perhaps it could be part of this KEP to ensure this feature is safe by default?

```

The per-HPA sync frequency is implemented via a new `PerItemIntervalRateLimiter`
in the HPA controller's workqueue. This rate limiter supports per-key interval
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't metrics-server and custom/external metrics API need to respond in less than 1s to ensure the workqueue doesn't saturate? This looks like a very short timeout. Do you see how to prevent the workqueue from blocking? Perhaps this new field should only be a hint (i.e. a best-effort goal)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants