Abhishek/fix api gw peering#23454
Merged
LordAbhishek merged 3 commits intomainfrom Apr 22, 2026
Merged
Conversation
e61bd4b to
535a23e
Compare
028ef23 to
38997ae
Compare
38997ae to
5d6ffac
Compare
5d6ffac to
e9fb247
Compare
anandmukul93
approved these changes
Apr 22, 2026
Contributor
anandmukul93
left a comment
There was a problem hiding this comment.
LGTM . need to see if all acceptance tests pass. and need to rerun the multiport tests as well post this.
* add unit test cases for api gateway supporting consul peering * add compiled xds config golden files after running api-gw golden testcases * fix(proxyCfg): propogate meshGatewayConfig to upstreams of API gateway (#23369) * fix(proxyCfg): propogate meshGatewayConfig to upstreams of API gateway - Fixed handleRouteConfigUpdate to properly propagate meshGatewayConfig to API gateway upstreams, which is required during XDS endpoint and cluster config generation. - Added TestStateChangedAPIGateway test cases in state_test.go to validate API gateway update handling. - Added API gateway-specific logging prefix (similar to mesh gateway) to help in debugging. * fix(xds): correct endpoint config generation for API gateway in peered setups (#23370) * fix(xds): correct endpoint config generation for API gateway in peered setups - Previously, API gateway XDS endpoint generation incorrectly relied on cfgSnap.ConnectProxy config (instead of cgfSnap.APIGateway), which caused wrong/no endpoint configuration for peered environments. - Changes made: - Updated makeUpstreamLoadAssignmentForPeerService to fetch localGatewayEndpoint based on cfgSnap kind instead of always using cfgSnap.ConnectProxy. - Updated endpointsFromDiscoveryChain to derive meshGatewayMode based on cfgSnap kind instead of always using cfgSnap.ConnectProxy. - Recompiled golden test file to reflect fix. * removed comment * fix(xds): correct cluster config generation for API gateway in peered setups (#23371) * fix(xds): correct cluster config generation for API gateway in peered setups - Updated makeUpstreamClustersForDiscoveryChain to generate cluster config based on upstream endpoint type. Before this fix, it always generated cluster configs without endpoints, which is incorrect when the upstream endpoint type is hostname and mesh-gateway mode is remote; in such cases, endpoints must also be included in the cluster config. - Added recompiled golden test file to reflect the fix. * fix lint error
e9fb247 to
e374fd2
Compare
This was referenced Apr 22, 2026
Collaborator
|
📣 Hi @LordAbhishek! a backport is missing for this PR [23454] for versions [1.18,1.21] please perform the backport manually and add the following snippet to your backport PR description: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds
api gatewaypeering unit test cases and fixes couple of issues discussed below with “api gateway when operating in peering”.Please note: This PR merges 3 other small PRs.
#23369
#23370
#23371
Issues details (Theoretically): API Gateway issue with peering and fix.docx
Details:
Whenever we generate the endpoints config for
api-gateway, withinmakeUpstreamLoadAssignmentForPeerService(xds/endpoints.go), it always fetchlocal_mesh_gateway_endpointonly fromconnect-proxy, instead ofapi-gatewaycfgSnap.localGw, ok := cfgSnap.ConnectProxy.WatchedLocalGWEndpoints.Get(cfgSnap.Locality.String()).If not fixed, api-gateway’s cluster for peering will always endup with no endpoints, when meshGatewayConfig.Mode is local.
Fixed it to fetch the endpoints based on cfgSnap kind. PR
Whenever we generate the endpoint config for
api-gateway, withinendpointsFromDiscoveryChain(xds/endpoints.go), it fetchesmeshGatewayConfigto configure endpoints, if it needs to point to local meshGateway or remote meshGateway.Here also, this method always fetches
meshGatewayConfig.Modefrom connect-proxy, instead of api-gateway. So,mgwModeis always nil and nil defaults to remote meshGatewayConfig.Mode.So, even if we configure the mesh-gateway to local, it will always fallback to remote.
Fixed it to fetch the
meshGatewayConfigbased on cfgSnap kind. PRIn continuation of issue 2, where we have to fetch the
meshGatewayConfigfor api-gateway, we never have had themeshGatewayConfigin api-gateway upstreams.Found that
handleRouteConfigUpdate (proxycfg/api_gateway.go)which updates theupstreamsfor api-gateway, was skipping to propagatemeshGatewayConfigtoupstreams.Fixed it by adding
meshGatewayConfigwithin upstream object. PRWhenever we generate the cluster config for
api-gateway, withinmakeUpstreamClustersForDiscoveryChain(xds/clusters.go), it always generates the cluster without endpoints.As we know that whenever any service upstream endpoints is hostname, we have to send it via clusters config to CDS, because envoy is not able to resolve hostnames as EDS.
If not fixed, this would results envoy config without endpoints for some clusters (those whose upstream are of hostname type).
Fixed it to generate cluster based on upstream endpoint type as done for connect proxy or mesh gateway. PR
How the fix is tested:
Unit test cases:
$ go test ./agent/xds -run 'TestAllResourcesFromSnapshot/.*/api-gateway-with-peers-mesh-mode-(local|remote)-and-upstream-is-(hostname|static)' -count=1 -updateok github.com/hashicorp/consul/agent/xds 0.491s
$ go test ./agent/xdsok github.com/hashicorp/consul/agent/xds 1.924s
$ go test ./agent/proxycfgok github.com/hashicorp/consul/agent/proxycfg 5.105s
Consul-k8s acceptance tests:
consul-ent PR (same as this one):
Ran the consul-k8s workflow with the patched consul-ent image.
All peering test ran fine on both EKS and Kind.
Kind test: https://github.com/hashicorp/consul-k8s-workflows/actions/runs/22376072101
EKS test: https://github.com/hashicorp/consul-k8s-workflows/actions/runs/22376066354