feat(shared): unified auth for shared v3 apps (external + in-cluster strong identity)#3293
feat(shared): unified auth for shared v3 apps (external + in-cluster strong identity)#3293oscdjj wants to merge 77 commits into
Conversation
Wire the app-gateway install stack into the CLI cluster phases: helm, values, and module helpers plus Linkerd mesh scaffolding so shared-app routing is set up during install.
Add the app-gateway framework module: embedded default config, vendor chart packaging for Linkerd and Envoy Gateway, and the scaffolding the installer consumes to render the gateway.
Add app-gateway-mesh-np for the linkerd and app-gateway namespaces alongside others-np so data-plane proxies can reach identity, destination, and policy. Existing same-namespace ingress rules are unchanged.
Apply the vendor-packaged ingress policy only until both the linkerd and app-gateway namespaces have app-gateway-mesh-np from the app-service reconcile, then hand off to the controller-managed policy.
Generate and maintain the Linkerd trust anchor and issuer certificates during install so the mesh identity chain exists before meshed workloads start.
Allow installing linkerd-viz without its bundled Prometheus so the platform Prometheus can be reused for scraping instead of running a duplicate stack.
Resolve linkerd-viz and vendor chart assets from installer-only paths so installation does not depend on the source-tree layout.
Add default mesh and EnvoyProxy values to the app-gateway chart so the data plane renders with Linkerd injection enabled by default.
Render the EnvoyProxy resource with Linkerd data-plane annotations so gateway pods join the mesh.
Add helm-wait and mesh helper routines so installation blocks until the app-gateway chart and its meshed data plane are ready.
Expose app-gateway-data (80 to 10080) with Envoy Gateway selector labels as a fixed in-cluster entry to the data plane. Extend the mesh verify script to check the Service, port mapping, and Endpoints against the EG pods.
Add the gateway.olares.io/v1alpha1 SharedRouteRegistry with validated hostPatterns. Normalize hosts, build the spec from sharedEntrances, and reconcile one SRR per gateway-mode v3 Application with an Application owner reference.
When gateway.olares.io/route-mode=gateway, point shared v3 vhosts at app-gateway-data.app-gateway.svc.cluster.local:80 instead of the per-app Service. Direct mode and non-shared apps are unchanged.
Add a cluster-scoped ClusterConfig for platformDomain, read via a cached dynamic client with env fallback. Add hash8 URL helpers and logical hash8.*.domain pattern support without changing the Phase-A public APIs.
Emit one SRR per sharedEntrance with hash8.*.platformDomain and reject duplicate hash8 cluster-wide. Map logical patterns to *.domain hostnames plus Host regular-expression matches on the HTTPRoute.
…l4 routing Consolidate SRR host reconciliation and L4 routing into app-service: materialize HTTPRoute and NetworkPolicy from the SRR and drive l4-bfl-proxy upstream selection for shared gateway-mode apps.
Wire the ext-authz SecurityPolicy and the app-service authz server backend that authorizes shared requests, pointing the gateway ext-authz filter at app-service.
Make the mesh readiness wait more robust and bootstrap the chart-only network policy path for the window before app-service has reconciled the mesh policy.
Add demo upstream, meshed client, and HTTPRoute samples used to validate gateway routing end to end.
Apply a baseline ext-authz allow for non-v2 hostnames so legacy hosts keep working while v2 shared hosts are enforced.
Permit cluster-scoped apps to use the shared gateway path so they participate in shared routing.
Place the ingress NetworkPolicy and ReferenceGrant in the upstream Service namespace when it differs from the SRR namespace, prevent the security controller from deleting routecontrol-managed NPs, and fan out sharedEntrances into the L4 buildAppInfos so v2 gateway vhosts are generated.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
1 Skipped Deployment
|
| } | ||
| if allReady { | ||
| return true, "", nil | ||
| } |
There was a problem hiding this comment.
Mesh ready on one pod
High Severity
egDataPlaneMeshReady returns success as soon as any single active data-plane pod has a ready linkerd-proxy and envoy, without requiring every non-terminating pod to pass the same checks. During a rollout, install can finish while older replicas still lack mesh sidecars and remain in the Service endpoints.
Reviewed by Cursor Bugbot for commit e3c3517. Configure here.
| linkerdVals, err := utils.LoadValuesFile(filepath.Join(vendor, "linkerd-values.yaml")) | ||
| if err != nil { | ||
| linkerdVals = map[string]interface{}{} | ||
| } |
There was a problem hiding this comment.
Helm values parse errors ignored
Medium Severity
When loading Linkerd Helm values files fails for any reason other than a missing file, the install path replaces the error with an empty values map and continues. A corrupt or invalid YAML file can silently install Linkerd with defaults instead of failing fast.
Reviewed by Cursor Bugbot for commit e3c3517. Configure here.
| if t.Force && appGatewayStackEnabled() { | ||
| if err := ValidateAppGatewayInstallerArtifacts(runtime.GetInstallerDir()); err != nil { | ||
| return errors.Wrap(err, "Olares installer package incomplete for unified ingress (app-gateway)") | ||
| } |
There was a problem hiding this comment.
Wrong installer path validated
Medium Severity
CheckPrepared validates app-gateway artifacts with runtime.GetInstallerDir() only, while install tasks resolve the bundle via resolveInstallerDir, which prefers OLARES_INSTALLER_DIR.
Reviewed by Cursor Bugbot for commit 7b31209. Configure here.
7b31209 to
b111b49
Compare
| "linkerd": {"linkerd.io/control-plane-component"}, | ||
| "app-gateway": {"app.kubernetes.io/name=envoy-gateway"}, | ||
| }, | ||
| }, |
There was a problem hiding this comment.
Control plane check wrong namespace
Medium Severity
CheckAppGatewayControlPlaneReady lists Envoy Gateway pods in the hardcoded namespace app-gateway, but installs honor APP_GATEWAY_NAMESPACE via resolveAppGatewayNamespace. A non-default namespace leaves the readiness task polling an empty namespace while the control plane runs elsewhere.
Reviewed by Cursor Bugbot for commit b111b49. Configure here.
| initConfigForNetworkCRDs = utils.InitConfigForAppGateway | ||
| loadValuesForNetworkCRDs = utils.LoadValuesFile | ||
| applyCRDChartFunc = utils.TemplateAndServerSideApply | ||
| linkerdCRDsPresentFunc = linkerdPolicyCRDsPresent |
There was a problem hiding this comment.
Linkerd CRD skip too narrow
Medium Severity
ApplyNetworkCRDs treats Linkerd CRDs as present when only policy.linkerd.io/v1alpha1 is registered, then skips applying the bundled linkerd-crds chart. Clusters missing other Linkerd API groups can still fail later during the control-plane install.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 6694d63. Configure here.
6694d63 to
b891b24
Compare
b891b24 to
48e7d2c
Compare
Adopt main IsShared discriminator and resolve app-service controller conflicts.
48e7d2c to
1c570c2
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 6 total unresolved issues (including 5 from previous reviews).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 1064c71. Configure here.
| if err != nil || d.Namespace == "" { | ||
| return "app-gateway" | ||
| } | ||
| return d.Namespace |
There was a problem hiding this comment.
Namespace helper ignores env override
Medium Severity
Namespace() is documented to honor APP_GATEWAY_NAMESPACE, but it only reads embedded defaults.yaml. The CLI applies the env override via resolveAppGatewayNamespace(), so components using agwconfig.Namespace() can target a different namespace than the installer when the env var is set.
Reviewed by Cursor Bugbot for commit 1064c71. Configure here.


Summary: unified authentication for shared v3 apps (external + in-cluster strong identity)
Background
This PR introduces a single unified access path and a single authentication PEP for
every shared V3 app, covering both north-south traffic (browser / external API) and
in-cluster traffic (third-party caller Pods).
Capabilities :
per-viewer address (
https://entrance-id.<user>.<platform-domain>/…); in-clustertraffic is steered to the unified app-gateway over the same Host and path semantics.
(d2 TLS offload → Linkerd mTLS → app-gateway-data → Envoy Gateway → ext_authz →
shared backend), with identity derived from Linkerd
l5d-client-id, fail-closed ext_authz,audit fields and Prometheus metrics.
Key building blocks introduced/changed
regex routes, route-mode=gateway automation (opt-out), caller manifest contract,
caller/entrance-TLS reconcilers, shared-hosts ConfigMap reconciler, d2 inject
fail-open sentinels, TLS-replica mount-guard validating webhook.
framework scaffold, app-service ext_authz fail-closed backend, stable
app-gateway-data L4 upstream Service, EnvoyProxy rendering for the Linkerd data plane.
app-service/pkg/gateway/authz): ext_authz security policy + authzserver backend, in-cluster identity deciders, audit logs and metrics, hardened
host-user header parsing.
app-service/pkg/gateway/routecontrol): per-viewer TLS secretsync, caller Linkerd inject with safe NetworkPolicy ordering, TLS replica fan-out,
frozen in-cluster strong-id service port.
kill-switch.
NetworkPolicy builders, exempt linkerd-proxy uid from olares-envoy iptables capture.
optional linkerd-viz, mesh NetworkPolicy fallback, helm wait/mesh helpers).
Target Version for Merge
1.12.6
Note
High Risk
Changes platform install/upgrade, cluster PKI/TLS for Linkerd identity, and north-south auth (ext_authz fail-closed); misconfiguration can break ingress or mesh for the whole cluster.
Overview
Adds a unified ingress stack (Linkerd + Envoy Gateway + Gateway API) under
framework/app-gateway, delivered as theapp-gateway-systemumbrella Helm chart with stableapp-gateway-dataservice, fail-closed ext_authz toapp-service, and an always-onlinkerd-pki-guardiancontroller (replacing a legacy CronJob).Olares CLI / install & upgrade now bootstrap this stack before os-framework: validate installer artifacts, prepare Linkerd PKI (
olares-linkerd-pki), server-side apply large CRD bundles when missing, installapp-gateway-systemin one release (issuer PEM injected from the secret), optional mesh rollout/waits, and bootstrap mesh NetworkPolicies. Platform upgrade replaces separate vendor/chart upgrade tasks with PrepareLinkerdPKI → ApplyNetworkCRDs → UpgradeAppGatewaySystem.Release packaging runs
assemble-app-gateway-system.shto vendor Linkerd/Envoy charts into the wizard bundle and copiesapp-gateway-vendor(certs script, values). Helm helpers gain app-gateway–specific init, CRD-skipping installs with--wait, andkubectlserver-side apply for CRDs.Also adds manual Docker publish workflows for
linkerd-pki-guardianand the d2 sidecar image, plus extensive unit tests for install ordering and PKI/mesh behavior.Reviewed by Cursor Bugbot for commit 1064c71. Bugbot is set up for automated code reviews on this repo. Configure here.