Skip to content

ILB: Optimize shared resource lock with fixed hashed pool and OCC#1022

Closed
08volt wants to merge 159 commits intokubernetes:release-1.35from
08volt:optimized-lock
Closed

ILB: Optimize shared resource lock with fixed hashed pool and OCC#1022
08volt wants to merge 159 commits intokubernetes:release-1.35from
08volt:optimized-lock

Conversation

@08volt
Copy link
Copy Markdown
Member

@08volt 08volt commented Mar 27, 2026

Replaced the heavy global mutex sharedResourceLock in the ILB controller with a hybrid locking strategy to eliminate serialization bottlenecks:

  1. Fixed-Size Hashed Lock Pool (4096 buckets): Serializes creation/read-modify-write cycles for shared unmanaged resources (HealthChecks, InstanceGroups) without unbounded memory growth ($O(1)$ space complexity).
  2. Optimistic Concurrency Control (OCC): Removes locks for BackendService updates, relying on native GCE API fingerprints to detect and retry conflicts.

This unblocks independent service reconciliations and prevents nodesync stalls during service update storms.

jakweg and others added 30 commits January 9, 2026 09:42
…s. Emulating different platforms and compiling code tends to be slow and leads to timeouts
cleanup: Delete unused packages
Signed-off-by: LogicalShark <maralder@google.com>
Use native build platform to speed up build times for multiarch builds.
Add Resource Annotations to L4 LB Service
An adapted copy of similar changes done to ingress-gce:
* two new GCE flags `enable-l4-deny-firewall` and `enable-l4-deny-firewall-rollback-cleanup`,
* adds deny firewall functionality with correct order for provisioning/cleanup, the new firewall is following the previous naming scheme and adds "-deny" suffix at the end,
* exports metric "number_of_l4_netlbs" including firewall deny state and general status,
* vendors in `cmpopts` for easier testing.
Integrated the test/e2e directory into the Go workspace to resolve module resolution and version skew issues.

Previously, running 'go test -c' within 'test/e2e' failed locally because:
1. The root 'go.work' file excluded './test/e2e', causing Go to treat it as a sub-package of the root module, which conflicted with the presence of 'test/e2e/go.mod'.
2. There was a version mismatch in 'test/e2e/go.mod' (Kubernetes v1.31.5 vs v1.34.2 in the root), leading to 'undefined' symbol errors when building without workspace mode.

Changes:
- Updated 'go.work' and 'tools/update_vendor.sh' to include './test/e2e'.
- Updated 'test/e2e/go.mod' to use Kubernetes v1.34.2 and renamed the module to 'k8s.io/cloud-provider-gcp/test/e2e' to match the directory structure.
- Fixed API compatibility issues in 'test/e2e/loadbalancer.go' (Scale API change) and 'test/e2e/network_tiers.go' (Logf format string).
- Updated 'test/e2e/firewall.go' to use the modern 'framework.GetControlPlaneNodes' helper.
Signed-off-by: LogicalShark <maralder@google.com>
Fix local build of e2e test binary
Update .gitignore to exclude local build artifacts
Signed-off-by: LogicalShark <maralder@google.com>
Add `make test` to replace `bazel test`
Migrates the release-tars build process from Bazel to Make
Signed-off-by: LogicalShark <maralder@google.com>
hdp617 and others added 17 commits April 13, 2026 15:26
Because k8s deps are tightly coupled, they should ideally be updated together and isolated from other deps update. Also, clean up configs for obsolete directories.
fix: move dependabot-sync.yml under workflows
Implement standard adaptiveipam gRPC server (daemon_server.go) to listen for pod IP allocation requests over a Unix Domain Socket. And implement rpc AllocatePodIP and DeallocatePodIP.
Implemented retries for DB errors within daemon serer.
Implement new methods in the Store to interface with the SQLite DB, supporting idempotency for all:
AddCIDR: Add CIDR blocks and seeds individual IP addresses.
AllocateIPv4: Find available IP slots and flips is_allocated to true.
ReleaseIPByOwner: Releases pod IP addresses by owner identifiers and sets cooldown period timestamp.
Threading model:

Each RPC request can call store concurrently to optimize request latency. The DB transactions guarantees thread safet between concurrent requests. Existing WAL mode and busy_timeout supports high concurrent read/write operations without locking.
No implementation for IPv6 yet.
…n-to-emeritus

Move jprzychodzen to emeritus approvers
* fix: use commit SHA instead of tags for actions

I think k8s repos started to enforce using commit SHA (which makes sense from security perspective) because the workflow is currently failing with `The action actions/checkout@v4 is not allowed in kubernetes/cloud-provider-gcp because all actions must be pinned to a full-length commit SHA.`

Also update dependabot to handle github actions.

* fix: use commit SHA for actions/setup-go
@08volt 08volt force-pushed the optimized-lock branch 3 times, most recently from f238d30 to 76346f3 Compare April 23, 2026 13:16
ygao-g and others added 2 commits April 23, 2026 23:52
…rce-specific mutex pool for GCE load balancer operations (except external) and protect with feature flag
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 27, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the sig/network Categorizes an issue or PR as relevant to SIG Network. label Apr 27, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: 08volt
Once this PR has been reviewed and has the lgtm label, please assign bowei for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 27, 2026
@08volt 08volt closed this Apr 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/network Categorizes an issue or PR as relevant to SIG Network. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.