-
Notifications
You must be signed in to change notification settings - Fork 6
feat: e2e test framework improvements for local + CI execution #41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: cluster-autoscaler-release-1.31.5-aks
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,83 @@ | ||
| --- | ||
| name: run-azure-e2e-tests | ||
| description: 'Run Azure CAS end-to-end tests — per-suite execution with focus filtering, background execution, and local/CI workflows. Use when: running e2e tests, debugging test failures, adding new test suites.' | ||
| --- | ||
|
|
||
| # E2E Tests for Azure CAS | ||
|
|
||
| ## Test Structure | ||
|
|
||
| ``` | ||
| cluster-autoscaler/cloudprovider/azure/test/ | ||
| ├── suites/ | ||
| │ └── scaleup/ # Scale-up/down test | ||
| │ └── suite_test.go | ||
| ├── pkg/ | ||
| │ └── environment/ # Shared Environment struct + helpers | ||
| │ └── environment.go | ||
| ├── Makefile # Local + CI targets | ||
| └── go.mod | ||
| ``` | ||
|
|
||
| ## Local Developer Workflow | ||
|
|
||
| From `cluster-autoscaler/cloudprovider/azure/test/`: | ||
|
|
||
| ### First-time setup | ||
|
|
||
| ```bash | ||
| az login | ||
| make setup-cluster # Creates AKS + ACR + workload identity (~5 min) | ||
| make deploy-local # Builds + deploys CAS via skaffold (~1 min) | ||
| ``` | ||
|
|
||
| ### Running tests | ||
|
|
||
| ```bash | ||
| export AZURE_SUBSCRIPTION_ID="$(az account show --query id -o tsv)" | ||
| export AZURE_RESOURCE_GROUP="MC_..." # Node resource group (printed by setup-cluster) | ||
|
|
||
| make e2etests # Run all suites | ||
| make e2etests TEST_SUITE=scaleup # Run single suite | ||
| make e2etests FOCUS="scales up" # Focus filter | ||
| ``` | ||
|
|
||
| ### After code changes | ||
|
|
||
| ```bash | ||
| make deploy-local # Rebuild + redeploy CAS | ||
| make e2etests TEST_SUITE=scaleup | ||
| ``` | ||
|
|
||
| ### Utility commands | ||
|
|
||
| - `make list-suites` — list available test suites | ||
| - `make validate-env` — check required env vars | ||
| - `make deploy-local-dev` — skaffold watch mode (auto-redeploy on changes) | ||
|
|
||
| ### Background execution (survives VPN drops) | ||
|
|
||
| ```bash | ||
| nohup make e2etests TEST_SUITE=scaleup > e2e.log 2>&1 & | ||
| tail -f e2e.log | ||
| ``` | ||
|
|
||
| ## CI (Prow) | ||
|
|
||
| `make test-e2e` builds the CAS image and deploys via Helm (inside BeforeSuite), using cluster info from CAPZ. The Helm deploy is triggered by `-cas-image-repository` and `-cas-image-tag` flags — when absent (local path), Helm is skipped. | ||
|
|
||
| ## Monitoring | ||
|
|
||
| - **Logs**: `tail -f e2e.log` | ||
| - **Cluster**: `kubectl get nodes,pods -w` | ||
| - **Events**: `kubectl get events -A --field-selector source=cluster-autoscaler --watch` | ||
| - **VMSS**: `az vmss list -g $AZURE_RESOURCE_GROUP -o table` | ||
| - **CAS logs**: `kubectl logs -n kube-system deploy/cluster-autoscaler -f` | ||
|
|
||
| ## Adding a New Test Suite | ||
|
|
||
| 1. Create `test/suites/<name>/suite_test.go` | ||
| 2. Import `pkg/environment` for shared helpers | ||
| 3. Register `-resource-group` flag in `init()` | ||
| 4. Create `Environment` in `BeforeSuite`, call `EnsureHelmRelease(...)` for CI compatibility | ||
| 5. Run: `make e2etests TEST_SUITE=<name>` |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,43 @@ | ||
| { | ||
| "name": "Azure CAS Dev", | ||
| "image": "mcr.microsoft.com/devcontainers/go:1.22", | ||
| "runArgs": ["--cap-add=SYS_PTRACE", "--security-opt", "seccomp=unconfined"], | ||
|
|
||
| "customizations": { | ||
| "vscode": { | ||
| "settings": { | ||
| "go.toolsManagement.checkForUpdates": "local", | ||
| "go.useLanguageServer": true, | ||
| "go.gopath": "/go", | ||
| "chat.useAgentSkills": true | ||
| }, | ||
| "extensions": [ | ||
| "golang.Go", | ||
| "ms-kubernetes-tools.vscode-kubernetes-tools", | ||
| "ms-kubernetes-tools.vscode-aks-tools", | ||
| "ms-azuretools.vscode-bicep", | ||
| "GitHub.vscode-pull-request-github", | ||
| "GitHub.copilot-chat" | ||
| ] | ||
| } | ||
| }, | ||
|
|
||
| "features": { | ||
| "ghcr.io/devcontainers/features/docker-outside-of-docker:1": {}, | ||
| "ghcr.io/devcontainers/features/kubectl-helm-minikube:1": { | ||
| "helm": "latest", | ||
| "minikube": "none" | ||
| }, | ||
| "ghcr.io/devcontainers/features/azure-cli:1": {}, | ||
| "ghcr.io/devcontainers/features/github-cli:1": {}, | ||
| "ghcr.io/rio/features/skaffold:2": {} | ||
| }, | ||
|
|
||
| "postCreateCommand": { | ||
| "install ko": "go install github.com/google/ko@latest", | ||
| "install yq": "go install github.com/mikefarah/yq/v4@latest", | ||
| "disable skaffold metrics": "skaffold config set --global collect-metrics false" | ||
| }, | ||
|
|
||
| "remoteUser": "vscode" | ||
| } |
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This file will assume a resource group ( I suggest modifying this one to require |
| Original file line number | Diff line number | Diff line change | ||||||
|---|---|---|---|---|---|---|---|---|
| @@ -1,5 +1,6 @@ | ||||||||
| REPO_ROOT:=$(shell git rev-parse --show-toplevel) | ||||||||
| CAS_ROOT:=$(REPO_ROOT)/cluster-autoscaler | ||||||||
| DEV_DIR:=$(CAS_ROOT)/cloudprovider/azure/examples/dev | ||||||||
|
|
||||||||
| BUILD_TAGS=azure | ||||||||
|
|
||||||||
|
|
@@ -8,20 +9,100 @@ include $(CAS_ROOT)/Makefile | |||||||
| CLUSTER_AUTOSCALER_NAMESPACE?=default | ||||||||
| CLUSTER_AUTOSCALER_SERVICEACCOUNT_NAME?=cluster-autoscaler | ||||||||
|
|
||||||||
| # TEST_SUITE selects a specific test suite directory (e.g., TEST_SUITE=scaleup). | ||||||||
| # Default "..." runs all suites. | ||||||||
| TEST_SUITE?=... | ||||||||
| TEST_TIMEOUT?=3h | ||||||||
| FOCUS?= | ||||||||
| LABEL_FILTER?= | ||||||||
| ARTIFACTS?=_artifacts | ||||||||
|
|
||||||||
| # -------------------------------------------------------------------------- | ||||||||
| # CI target (Prow — builds CAS image, discovers cluster info from CAPZ) | ||||||||
| # -------------------------------------------------------------------------- | ||||||||
|
|
||||||||
| .PHONY: build-e2e | ||||||||
| build-e2e: | ||||||||
| $(MAKE) -C $(CAS_ROOT) build-arch-$(GOARCH) make-image-arch-$(GOARCH) BUILD_TAGS=${BUILD_TAGS} | ||||||||
| docker push $(IMAGE)-$(GOARCH):$(TAG) | ||||||||
|
|
||||||||
| ARTIFACTS?=_artifacts | ||||||||
|
|
||||||||
| .PHONY: test-e2e | ||||||||
| test-e2e: build-e2e | ||||||||
| go run github.com/onsi/ginkgo/v2/ginkgo --tags e2e -v --trace --output-dir "$(ARTIFACTS)" --junit-report="junit.e2e_suite.1.xml" e2e -- \ | ||||||||
| test-e2e: build-e2e ## CI: build image + run tests (Prow/CAPZ) | ||||||||
| go run github.com/onsi/ginkgo/v2/ginkgo --tags e2e -v --trace \ | ||||||||
| --timeout $(TEST_TIMEOUT) \ | ||||||||
| --output-dir "$(ARTIFACTS)" --junit-report="junit.e2e_suite.1.xml" \ | ||||||||
| ./suites/$$(echo $(TEST_SUITE) | tr A-Z a-z)/... -- \ | ||||||||
| -resource-group="$$(KUBECONFIG= kubectl get managedclusters -n default -o jsonpath='{.items[0].status.nodeResourceGroup}')" \ | ||||||||
|
Comment on lines
+33
to
35
|
||||||||
| -cluster-name="$$(KUBECONFIG= kubectl get cluster -n default -o jsonpath='{.items[0].metadata.name}')" \ | ||||||||
| -client-id="$$(KUBECONFIG= kubectl get userassignedidentities -n default -o jsonpath='{.items[0].status.clientId}')" \ | ||||||||
| -cas-namespace="$(CLUSTER_AUTOSCALER_NAMESPACE)" \ | ||||||||
| -cas-serviceaccount-name="$(CLUSTER_AUTOSCALER_SERVICEACCOUNT_NAME)" \ | ||||||||
| -cas-image-repository="$(IMAGE)-$(GOARCH)" \ | ||||||||
| -cas-image-tag="$(TAG)" | ||||||||
|
|
||||||||
| # -------------------------------------------------------------------------- | ||||||||
| # Local developer targets | ||||||||
| # -------------------------------------------------------------------------- | ||||||||
| # Prerequisites: | ||||||||
| # 1. az login | ||||||||
| # 2. make setup-cluster (creates AKS + ACR + identity, one-time) | ||||||||
| # 3. make deploy-local (builds + deploys CAS via skaffold) | ||||||||
| # 4. make e2etests (runs the tests) | ||||||||
| # | ||||||||
| # Required env var: AZURE_RESOURCE_GROUP (set automatically by setup-cluster) | ||||||||
|
|
||||||||
| .PHONY: help | ||||||||
| help: ## Display help | ||||||||
| @awk 'BEGIN {FS = ":.*##"; printf "Usage:\n make \033[36m<target>\033[0m\n"} \ | ||||||||
| /^[a-zA-Z_0-9-]+:.*?##/ { printf " \033[36m%-25s\033[0m %s\n", $$1, $$2 } \ | ||||||||
| /^##@/ { printf "\n\033[1m%s\033[0m\n", substr($$0, 5) }' $(MAKEFILE_LIST) | ||||||||
|
|
||||||||
| ##@ Cluster Setup (one-time) | ||||||||
|
|
||||||||
| .PHONY: setup-cluster | ||||||||
| setup-cluster: ## Create AKS cluster + ACR + workload identity for e2e testing | ||||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I like to add breadcrumbs to Makefiles (and Taskfiles) so that it's easier to see what step is executing at any one time. This helps with troubleshooting issues as the observer (whether human or AI) can go straight to the failing step.
Suggested change
|
||||||||
| cd $(DEV_DIR) && bash ./aks-dev-deploy.sh | ||||||||
|
|
||||||||
| ##@ Build & Deploy | ||||||||
|
|
||||||||
| .PHONY: deploy-local | ||||||||
| deploy-local: ## Build CAS and deploy to cluster via skaffold | ||||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||
| cd $(CAS_ROOT) && skaffold run --filename cloudprovider/azure/examples/dev/skaffold.yaml | ||||||||
|
|
||||||||
| .PHONY: deploy-local-dev | ||||||||
| deploy-local-dev: ## Build + deploy CAS in watch mode (auto-redeploy on changes) | ||||||||
| cd $(CAS_ROOT) && skaffold dev --filename cloudprovider/azure/examples/dev/skaffold.yaml | ||||||||
|
|
||||||||
| ##@ E2E Testing | ||||||||
|
|
||||||||
| .PHONY: e2etests | ||||||||
| e2etests: ## Run e2e tests (CAS must already be deployed) | ||||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||
| go run github.com/onsi/ginkgo/v2/ginkgo \ | ||||||||
| --tags e2e \ | ||||||||
| -v --trace \ | ||||||||
| --timeout $(TEST_TIMEOUT) \ | ||||||||
| --output-dir "$(ARTIFACTS)" \ | ||||||||
| --junit-report="junit.e2e_suite.1.xml" \ | ||||||||
| $(if $(FOCUS),--focus="$(FOCUS)",) \ | ||||||||
| $(if $(LABEL_FILTER),--label-filter="$(LABEL_FILTER)",) \ | ||||||||
| ./suites/$$(echo $(TEST_SUITE) | tr A-Z a-z)/... -- \ | ||||||||
| -resource-group="$(AZURE_RESOURCE_GROUP)" | ||||||||
|
Comment on lines
+87
to
+89
|
||||||||
|
|
||||||||
| ##@ Utilities | ||||||||
|
|
||||||||
| .PHONY: list-suites | ||||||||
| list-suites: ## List available test suites | ||||||||
| @find suites -mindepth 1 -maxdepth 1 -type d -printf '%f\n' 2>/dev/null || echo "No suites found." | ||||||||
|
|
||||||||
| .PHONY: validate-env | ||||||||
| validate-env: ## Check required environment variables | ||||||||
| @missing=""; \ | ||||||||
| for var in AZURE_SUBSCRIPTION_ID AZURE_RESOURCE_GROUP; do \ | ||||||||
| eval val=\$$$$var; \ | ||||||||
| if [ -z "$$val" ]; then missing="$$missing $$var"; fi; \ | ||||||||
| done; \ | ||||||||
| if [ -n "$$missing" ]; then \ | ||||||||
| echo "ERROR: Missing required environment variables:$$missing"; \ | ||||||||
| exit 1; \ | ||||||||
| fi; \ | ||||||||
| echo "All required environment variables are set." | ||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have PR#9410 upstream to add a devcontainer to CAS, we should make sure we're making compatible changes.
Mine omits some Azure specific stuff to keep it relevant across all the providers, was planning to add those in our fork.