Skip to content

feat: add gateway-api support#299

Open
l0wl3vel wants to merge 17 commits into
masterfrom
feat/gatewayapi
Open

feat: add gateway-api support#299
l0wl3vel wants to merge 17 commits into
masterfrom
feat/gatewayapi

Conversation

@l0wl3vel
Copy link
Copy Markdown

@l0wl3vel l0wl3vel commented May 6, 2026

Description

  • Add kind-cloud-controller-manager to provide Type: Loadbalancer services
  • Introduce envoy-gateway as the Gateway API implementation
  • Move metal-stack control plane kind cluster into the mini_lab_external docker network
    • can't select container IP in the default docker bridge, which we need for the pre-defined *.nip.io DNS records
  • Kept ingress-nginx for now. Still required for Dex, Thanos, Gardener, PowerDNS

WIPs

  • Certificates are a bit messed up still (using default-gateway cert for grcp termination)
  • Link metal-roles pr branch to run ci in pull request metal-roles PR is merged
  • CI is failing - looks like an unrelated timeout

Used AI-Tools ✨

  • none used for generation

Closes: #297

Requires: metal-stack/helm-charts#156 and metal-stack/metal-roles#594

Tested configurations

  • Sonic
  • Dell Sonic
  • Gardener (looks good, deploys correctly with metal-stack on GWAPI and Gardener components still on ingress-nginx, further testing required)
  • Kamaji
    • non-functional. So likely a wontfix, unless it gets integreated into mini-lab. Only usable in capi-lab, which uses an old pinned version of mini-lab.

@metal-robot metal-robot Bot added this to Development May 6, 2026
@l0wl3vel l0wl3vel force-pushed the feat/gatewayapi branch 2 times, most recently from 28079c5 to f84c000 Compare May 8, 2026 14:58
@ma-hartma
Copy link
Copy Markdown
Contributor

Dell Sonic does actually work, but you need credentials to pull from r.metal-stack.io.

@l0wl3vel l0wl3vel mentioned this pull request May 26, 2026
9 tasks
@vknabel
Copy link
Copy Markdown
Contributor

vknabel commented May 28, 2026

Sadly I got the following error:

deploy-control-plane  | TASK [ansible-common/roles/helm-chart : Copy over custom helm charts] **********
deploy-control-plane  | fatal: [localhost]: FAILED! => 
deploy-control-plane  |     changed: false
deploy-control-plane  |     cmd: /usr/bin/rsync --delay-updates -F --compress --delete-after --archive --out-format='<<CHANGED>>%i
deploy-control-plane  |         %n%L' /helm-charts/charts/metal-control-plane /tmp/helm-chart
deploy-control-plane  |     msg: |-
deploy-control-plane  |         rsync: [sender] change_dir "/helm-charts/charts" failed: No such file or directory (2)
deploy-control-plane  |         rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1338) [sender=3.4.1]
deploy-control-plane  |     rc: 23

I had the following overrides metal_roles_version: gatewayapi

@l0wl3vel
Copy link
Copy Markdown
Author

Sadly I got the following error:

deploy-control-plane  | TASK [ansible-common/roles/helm-chart : Copy over custom helm charts] **********
deploy-control-plane  | fatal: [localhost]: FAILED! => 
deploy-control-plane  |     changed: false
deploy-control-plane  |     cmd: /usr/bin/rsync --delay-updates -F --compress --delete-after --archive --out-format='<<CHANGED>>%i
deploy-control-plane  |         %n%L' /helm-charts/charts/metal-control-plane /tmp/helm-chart
deploy-control-plane  |     msg: |-
deploy-control-plane  |         rsync: [sender] change_dir "/helm-charts/charts" failed: No such file or directory (2)
deploy-control-plane  |         rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1338) [sender=3.4.1]
deploy-control-plane  |     rc: 23

I had the following overrides metal_roles_version: gatewayapi

@vknabel fixed in 629cb02

@l0wl3vel
Copy link
Copy Markdown
Author

@Sven-Ric Would you mind taking a look at the network changes?

@l0wl3vel l0wl3vel requested review from Sven-Ric and vknabel May 29, 2026 13:10
@l0wl3vel l0wl3vel marked this pull request as ready for review June 1, 2026 06:53
@l0wl3vel l0wl3vel requested review from a team as code owners June 1, 2026 06:53
l0wl3vel added 16 commits June 2, 2026 14:13
Signed-off-by: Benjamin Ritter <benjamin.ritter@x-cellent.com>
Signed-off-by: Benjamin Ritter <benjamin.ritter@x-cellent.com>
Signed-off-by: Benjamin Ritter <benjamin.ritter@x-cellent.com>
Signed-off-by: Benjamin Ritter <benjamin.ritter@x-cellent.com>
Signed-off-by: Benjamin Ritter <benjamin.ritter@x-cellent.com>
Signed-off-by: Benjamin Ritter <benjamin.ritter@x-cellent.com>
Signed-off-by: Benjamin Ritter <benjamin.ritter@x-cellent.com>
Signed-off-by: Benjamin Ritter <benjamin.ritter@x-cellent.com>
Signed-off-by: Benjamin Ritter <benjamin.ritter@x-cellent.com>
Signed-off-by: Benjamin Ritter <benjamin.ritter@x-cellent.com>
Signed-off-by: Benjamin Ritter <benjamin.ritter@x-cellent.com>
Signed-off-by: Benjamin Ritter <benjamin.ritter@x-cellent.com>
Signed-off-by: Benjamin Ritter <benjamin.ritter@x-cellent.com>
Signed-off-by: Benjamin Ritter <benjamin.ritter@x-cellent.com>
Signed-off-by: Benjamin Ritter <benjamin.ritter@x-cellent.com>
Signed-off-by: Benjamin Ritter <benjamin.ritter@x-cellent.com>
@Sven-Ric
Copy link
Copy Markdown

Sven-Ric commented Jun 5, 2026

It seems like the kind node always ends up in the default kind network on a clean first run. The kind network is read from .env, which is written by env.sh. However the Makefile reads .env before env.sh is invoked and the kind node network falls back to default. Because .env is persistent the bug is masked on all subsequent runs.

On initial run:

# docker inspect metal-control-plane-control-plane
[
    {
        <SNIP>
        "NetworkSettings": {
            <SNIP>
            "Networks": {
                "kind": {
                    "IPAMConfig": null,
                    "Links": null,
                    "Aliases": null,
                    "DriverOpts": null,
                    "GwPriority": 0,
                    "NetworkID": "6530b19e41b397d41d37f6a38d6b1bbd74c9ba2b7478df95f6a6270cc84c4d0e",
                    "EndpointID": "6d56f5f0fa83330b85e0b0ebbd04175a93d8586c48492c9b545beb7eeecce015",
                    "Gateway": "172.18.0.1",
                    "IPAddress": "172.18.0.2",
                    "MacAddress": "12:98:42:c8:e4:ec",
                    "IPPrefixLen": 16,
                    "IPv6Gateway": "fc00:f853:ccd:e793::1",
                    "GlobalIPv6Address": "fc00:f853:ccd:e793::2",
                    "GlobalIPv6PrefixLen": 64,
                    "DNSNames": [
                        "metal-control-plane-control-plane",
                        "bd976835cec0"
                    ]
                }
            }
        },
        "ImageManifestDescriptor": {
            "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
            "digest": "sha256:21c46cf61fd45873f89e6a1bfcba4b7904dffa84c2bec88aeeca9a0409af4725",
            "size": 743,
            "platform": {
                "architecture": "amd64",
                "os": "linux"
            }
        }
    }
]

On all subsequent runs:

# docker inspect metal-control-plane-control-plane
[
    {
        <SNIP>
        "NetworkSettings": {
            <SNIP>
            "Networks": {
                "mini_lab_internal": {
                    "IPAMConfig": null,
                    "Links": null,
                    "Aliases": null,
                    "DriverOpts": null,
                    "GwPriority": 0,
                    "NetworkID": "2734b8f942cae84d8693ecd43ab3bb9d5cd71905faf992fbfe5c3df17ddc376b",
                    "EndpointID": "62f8b2a6eb379bb65f13f6441a9249417fc9ce754218a29b699cd7511b393d29",
                    "Gateway": "172.42.0.1",
                    "IPAddress": "172.42.0.2",
                    "MacAddress": "66:e7:b9:9c:2e:39",
                    "IPPrefixLen": 16,
                    "IPv6Gateway": "",
                    "GlobalIPv6Address": "",
                    "GlobalIPv6PrefixLen": 0,
                    "DNSNames": [
                        "metal-control-plane-control-plane",
                        "5b12fbbedfdc"
                    ]
                }
            }
        },
        "ImageManifestDescriptor": {
            "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
            "digest": "sha256:21c46cf61fd45873f89e6a1bfcba4b7904dffa84c2bec88aeeca9a0409af4725",
            "size": 743,
            "platform": {
                "architecture": "amd64",
                "os": "linux"
            }
        }
    }
]

Signed-off-by: Benjamin Ritter <benjamin.ritter@x-cellent.com>
@l0wl3vel
Copy link
Copy Markdown
Author

l0wl3vel commented Jun 5, 2026

Thank you so much for checking it out @Sven-Ric. Fixed in baddf29. A few people checked this PR and it worked fine but CI was failing and I had no clue why. You saved me a lot of time 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

Add GatewayAPI support to mini-lab

4 participants