Skip to content

Commit 505a3aa

Browse files
committed
VPA: Refactor benchmark into packages and add recommender/admission metrics
Split monolithic main.go into focused packages: - pkg/cluster: Kubernetes resource management, profiles, constants - pkg/component: VPA component lifecycle - pkg/results: Output formatting, averaging, CSV export Key changes: - Component is now an object-oriented type holding shared kubeClient/restConfig - Benchmark scrapes all three VPA components latency steps Implements #9443 Signed-off-by: Max Cao <macao@redhat.com>
1 parent 422b9ec commit 505a3aa

File tree

5 files changed

+970
-610
lines changed

5 files changed

+970
-610
lines changed

vertical-pod-autoscaler/benchmark/README.md

Lines changed: 75 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,6 @@
22

33
Measures VPA component latencies using KWOK (Kubernetes WithOut Kubelet) to simulate pods without real resource consumption.
44

5-
> **Note:** Currently only updater metrics are collected. Recommender metrics are planned for the future.
6-
75
<!-- toc -->
86
- [Prerequisites](#prerequisites)
97
- [Quick Start (Local)](#quick-start-local)
@@ -12,7 +10,9 @@ Measures VPA component latencies using KWOK (Kubernetes WithOut Kubelet) to simu
1210
- [Profiles](#profiles)
1311
- [Flags](#flags)
1412
- [Metrics Collected](#metrics-collected)
13+
- [Recommender Metrics](#recommender-metrics)
1514
- [Updater Metrics](#updater-metrics)
15+
- [Admission Controller Metrics](#admission-controller-metrics)
1616
- [Scripts](#scripts)
1717
- [Cleanup](#cleanup)
1818
- [Notes](#notes)
@@ -22,7 +22,7 @@ Measures VPA component latencies using KWOK (Kubernetes WithOut Kubelet) to simu
2222

2323
## Prerequisites
2424

25-
- Go 1.21+
25+
- Go 1.25+
2626
- kubectl
2727
- Kind
2828
- Helm
@@ -68,30 +68,58 @@ go build -C benchmark -o ../bin/vpa-benchmark .
6868
The benchmark program (`main.go`) assumes the cluster is already set up with VPA, KWOK, and the fake node. It then:
6969

7070
1. For each profile run:
71-
- Scales down VPA components
72-
- Cleans up previous benchmark resources
71+
- Scales down all VPA components and cleans up previous benchmark resources
7372
- Creates ReplicaSets with fake pods assigned directly to KWOK node (bypasses scheduler)
74-
- Creates noise ReplicaSets (if `--noise-ratio` > 0) — these are not managed by any VPA
73+
- Creates noise ReplicaSets (if `--noise-percentage` > 0) — these are not managed by any VPA
7574
- Creates VPAs targeting managed ReplicaSets only
76-
- Scales up recommender, waits for recommendations
75+
- Scales up recommender and admission controller, waits for recommendations
76+
- Scrapes recommender execution latency metrics
7777
- Scales up updater, waits for its loop to complete
78-
- Scrapes `vpa_updater_execution_latency_seconds_sum` metrics
79-
2. Outputs results to stdout and/or a CSV file if specified
78+
- Scrapes updater and admission controller execution latency metrics
79+
2. Outputs per-run tables (with Avg column when multiple runs) and cross-profile summary tables to stdout and/or a CSV file
80+
81+
> [!NOTE]
82+
> Recommender and updater latencies are cumulative sums from a single loop. Admission controller latencies are per-request averages (sum divided by request count), since it handles many requests per benchmark run.
8083
8184
e.g., of output using this command: `bin/vpa-benchmark --profile=small,large,xxlarge`
8285

83-
```bash
84-
========== Results ==========
86+
```
87+
========== Results [Recommender] ==========
88+
┌─────────────────────┬───────────────┬────────────────┬───────────────────┐
89+
│ STEP │ SMALL ( 25 ) │ LARGE ( 250 ) │ XXLARGE ( 1000 ) │
90+
├─────────────────────┼───────────────┼────────────────┼───────────────────┤
91+
│ LoadVPAs │ 0.0005s │ 0.0022s │ 0.0099s │
92+
│ LoadPods │ 0.0007s │ 0.0138s │ 0.1869s │
93+
│ LoadMetrics │ 0.0031s │ 0.0055s │ 0.0036s │
94+
│ UpdateVPAs │ 0.0142s │ 0.5050s │ 8.0046s │
95+
│ MaintainCheckpoints │ 0.0174s │ 3.0046s │ 18.0054s │
96+
│ GarbageCollect │ 0.0001s │ 0.0055s │ 0.0426s │
97+
│ total │ 0.0361s │ 3.5367s │ 26.2529s │
98+
└─────────────────────┴───────────────┴────────────────┴───────────────────┘
99+
100+
========== Results [Updater] ==========
85101
┌───────────────┬───────────────┬────────────────┬───────────────────┐
86102
│ STEP │ SMALL ( 25 ) │ LARGE ( 250 ) │ XXLARGE ( 1000 ) │
87103
├───────────────┼───────────────┼────────────────┼───────────────────┤
88-
AdmissionInit │ 0.0000s │ 0.0001s │ 0.0004s
89-
EvictPods │ 2.4239s24.5535s │ 98.6963s
90-
│ FilterPods │ 0.0002s │ 0.0020s │ 0.0925s
91-
ListPods │ 0.0001s │ 0.0006s │ 0.0025s
92-
ListVPAs │ 0.0024s0.0030s │ 0.0027s
93-
│ total │ 2.4267s │ 24.5592s │ 98.7945s
104+
ListVPAs │ 0.0021s │ 0.0020s │ 0.0023s
105+
ListPods │ 0.0001s0.0004s │ 0.0022s
106+
│ FilterPods │ 0.0001s │ 0.0016s │ 0.0242s
107+
AdmissionInit │ 0.0000s │ 0.0001s │ 0.0003s
108+
EvictPods │ 2.3205s24.5523s │ 98.5502s
109+
│ total │ 2.3229s │ 24.5565s │ 98.5792s
94110
└───────────────┴───────────────┴────────────────┴───────────────────┘
111+
112+
========== Results [Admission Controller] ==========
113+
┌────────────────┬───────────────┬────────────────┬───────────────────┐
114+
│ STEP │ SMALL ( 25 ) │ LARGE ( 250 ) │ XXLARGE ( 1000 ) │
115+
├────────────────┼───────────────┼────────────────┼───────────────────┤
116+
│ read_request │ 0.0000s │ 0.0000s │ 0.0000s │
117+
│ admit │ 0.0004s │ 0.0005s │ 0.0007s │
118+
│ build_response │ 0.0000s │ 0.0000s │ 0.0000s │
119+
│ write_response │ 0.0000s │ 0.0000s │ 0.0000s │
120+
│ request_count │ 26 │ 251 │ 1001 │
121+
│ total │ 0.0005s │ 0.0005s │ 0.0007s │
122+
└────────────────┴───────────────┴────────────────┴───────────────────┘
95123
```
96124

97125
We can then compare the results of a code change with the results of the main branch.
@@ -121,17 +149,44 @@ When `--noise-percentage=P` is set, each profile also creates `P%` additional no
121149

122150
## Metrics Collected
123151

152+
All metrics are scraped from each component's `/metrics` endpoint via port-forwarding. Values are parsed from `vpa_<component>_execution_latency_seconds` histograms. Admission controller values are per-request averages.
153+
154+
### Recommender Metrics
155+
156+
Steps are listed in execution order.
157+
158+
| Step | Description |
159+
| ---- | ----------- |
160+
| `LoadVPAs` | Load VPA objects |
161+
| `LoadPods` | Load pods matching VPA targets |
162+
| `LoadMetrics` | Load metrics from metrics-server |
163+
| `UpdateVPAs` | Compute and write recommendations |
164+
| `MaintainCheckpoints` | Create/update VPA checkpoints |
165+
| `GarbageCollect` | Clean up stale data |
166+
| `total` | Total loop time |
167+
124168
### Updater Metrics
125169

126-
| Metric | Description |
127-
| ------ | ----------- |
170+
| Step | Description |
171+
| ---- | ----------- |
128172
| `ListVPAs` | List VPA objects |
129173
| `ListPods` | List pods matching VPA targets |
130174
| `FilterPods` | Filter evictable pods |
131175
| `AdmissionInit` | Verify admission controller status |
132176
| `EvictPods` | Evict pods needing updates |
133177
| `total` | Total loop time |
134178

179+
### Admission Controller Metrics
180+
181+
| Step | Description |
182+
| ---- | ----------- |
183+
| `read_request` | Parse incoming admission request |
184+
| `admit` | Compute resource recommendations for the pod |
185+
| `build_response` | Build admission response |
186+
| `write_response` | Write response back to API server |
187+
| `request_count` | Total number of admission requests handled |
188+
| `total` | Total per-request time |
189+
135190
## Scripts
136191

137192
| Script | Purpose |
@@ -147,7 +202,6 @@ Environment variables accepted by the scripts:
147202
| `KWOK_VERSION` | `v0.7.0` | `install-kwok.sh` |
148203
| `KWOK_NAMESPACE` | `kube-system` | `install-kwok.sh` |
149204
| `KWOK_NODE_NAME` | `kwok-node` | `install-kwok.sh` |
150-
| `VPA_NAMESPACE` | `kube-system` | `configure-vpa.sh` |
151205
| `KIND_CLUSTER_NAME` | `kind` | `full-benchmark.sh` |
152206

153207
## Cleanup
@@ -172,6 +226,6 @@ The benchmark includes several performance optimizations:
172226

173227
### Caveats
174228

175-
- The updater uses `time.Tick` which waits the full interval before the first tick, so the benchmark sleeps 2 minutes before polling for metrics
229+
- The updater uses `time.Tick` which waits the full interval before the first tick, so the benchmark polls for up to 5 minutes waiting for the updater's `total` metric to appear.
176230
- The benchmark uses Recreate update mode. In-place scaling is not supported on KWOK pods.
177231
- The benchmark scales down all VPA components at the start of each run, so that any caching is not a factor.

0 commit comments

Comments
 (0)