Skip to content

feat(shared): unified auth for shared v3 apps (external + in-cluster strong identity)#3293

Draft
oscdjj wants to merge 77 commits into
mainfrom
feat/unified-auth-for-shared-app
Draft

feat(shared): unified auth for shared v3 apps (external + in-cluster strong identity)#3293
oscdjj wants to merge 77 commits into
mainfrom
feat/unified-auth-for-shared-app

Conversation

@oscdjj

@oscdjj oscdjj commented Jun 8, 2026

Copy link
Copy Markdown
Member

Summary: unified authentication for shared v3 apps (external + in-cluster strong identity)

Background

This PR introduces a single unified access path and a single authentication PEP for
every shared V3 app, covering both north-south traffic (browser / external API) and
in-cluster traffic (third-party caller Pods).

Capabilities :

  • installable / reachable: after upgrade, all qualifying shared apps expose one
    per-viewer address (https://entrance-id.<user>.<platform-domain>/…); in-cluster
    traffic is steered to the unified app-gateway over the same Host and path semantics.
  • in-cluster strong identity (builds on the above): the full in-cluster chain
    (d2 TLS offload → Linkerd mTLS → app-gateway-data → Envoy Gateway → ext_authz →
    shared backend), with identity derived from Linkerd l5d-client-id, fail-closed ext_authz,
    audit fields and Prometheus metrics.

Key building blocks introduced/changed

  • app-service: SharedRouteRegistry CRD + reconciler, per-entrance SRR / logical-host
    regex routes, route-mode=gateway automation (opt-out), caller manifest contract,
    caller/entrance-TLS reconcilers, shared-hosts ConfigMap reconciler, d2 inject
    fail-open sentinels, TLS-replica mount-guard validating webhook.
  • app-gateway: vendored Linkerd + Envoy Gateway stack (version-locked), gateway
    framework scaffold, app-service ext_authz fail-closed backend, stable
    app-gateway-data L4 upstream Service, EnvoyProxy rendering for the Linkerd data plane.
  • authz (app-service/pkg/gateway/authz): ext_authz security policy + authz
    server backend, in-cluster identity deciders, audit logs and metrics, hardened
    host-user header parsing.
  • routecontrol (app-service/pkg/gateway/routecontrol): per-viewer TLS secret
    sync, caller Linkerd inject with safe NetworkPolicy ordering, TLS replica fan-out,
    frozen in-cluster strong-id service port.
  • sys-event: SRR-driven in-cluster CoreDNS allow-list, gated by a ClusterConfig
    kill-switch.
  • security / sandbox: NP-minimal for cluster-internal shared access, caller egress
    NetworkPolicy builders, exempt linkerd-proxy uid from olares-envoy iptables capture.
  • cli: bootstrap of the app-gateway install stack and Linkerd mesh (PKI/issuer,
    optional linkerd-viz, mesh NetworkPolicy fallback, helm wait/mesh helpers).

Target Version for Merge

1.12.6


Note

High Risk
Changes platform install/upgrade, cluster PKI/TLS for Linkerd identity, and north-south auth (ext_authz fail-closed); misconfiguration can break ingress or mesh for the whole cluster.

Overview
Adds a unified ingress stack (Linkerd + Envoy Gateway + Gateway API) under framework/app-gateway, delivered as the app-gateway-system umbrella Helm chart with stable app-gateway-data service, fail-closed ext_authz to app-service, and an always-on linkerd-pki-guardian controller (replacing a legacy CronJob).

Olares CLI / install & upgrade now bootstrap this stack before os-framework: validate installer artifacts, prepare Linkerd PKI (olares-linkerd-pki), server-side apply large CRD bundles when missing, install app-gateway-system in one release (issuer PEM injected from the secret), optional mesh rollout/waits, and bootstrap mesh NetworkPolicies. Platform upgrade replaces separate vendor/chart upgrade tasks with PrepareLinkerdPKI → ApplyNetworkCRDs → UpgradeAppGatewaySystem.

Release packaging runs assemble-app-gateway-system.sh to vendor Linkerd/Envoy charts into the wizard bundle and copies app-gateway-vendor (certs script, values). Helm helpers gain app-gateway–specific init, CRD-skipping installs with --wait, and kubectl server-side apply for CRDs.

Also adds manual Docker publish workflows for linkerd-pki-guardian and the d2 sidecar image, plus extensive unit tests for install ordering and PKI/mesh behavior.

Reviewed by Cursor Bugbot for commit 1064c71. Bugbot is set up for automated code reviews on this repo. Configure here.

oscdjj added 30 commits May 15, 2026 18:34
Wire the app-gateway install stack into the CLI cluster phases: helm, values, and module helpers plus Linkerd mesh scaffolding so shared-app routing is set up during install.
Add the app-gateway framework module: embedded default config, vendor chart packaging for Linkerd and Envoy Gateway, and the scaffolding the installer consumes to render the gateway.
Add app-gateway-mesh-np for the linkerd and app-gateway namespaces alongside others-np so data-plane proxies can reach identity, destination, and policy. Existing same-namespace ingress rules are unchanged.
Apply the vendor-packaged ingress policy only until both the linkerd and app-gateway namespaces have app-gateway-mesh-np from the app-service reconcile, then hand off to the controller-managed policy.
Generate and maintain the Linkerd trust anchor and issuer certificates during install so the mesh identity chain exists before meshed workloads start.
Allow installing linkerd-viz without its bundled Prometheus so the platform Prometheus can be reused for scraping instead of running a duplicate stack.
Resolve linkerd-viz and vendor chart assets from installer-only paths so installation does not depend on the source-tree layout.
Add default mesh and EnvoyProxy values to the app-gateway chart so the data plane renders with Linkerd injection enabled by default.
Render the EnvoyProxy resource with Linkerd data-plane annotations so gateway pods join the mesh.
Add helm-wait and mesh helper routines so installation blocks until the app-gateway chart and its meshed data plane are ready.
Expose app-gateway-data (80 to 10080) with Envoy Gateway selector labels as a fixed in-cluster entry to the data plane. Extend the mesh verify script to check the Service, port mapping, and Endpoints against the EG pods.
Add the gateway.olares.io/v1alpha1 SharedRouteRegistry with validated hostPatterns. Normalize hosts, build the spec from sharedEntrances, and reconcile one SRR per gateway-mode v3 Application with an Application owner reference.
When gateway.olares.io/route-mode=gateway, point shared v3 vhosts at app-gateway-data.app-gateway.svc.cluster.local:80 instead of the per-app Service. Direct mode and non-shared apps are unchanged.
Add a cluster-scoped ClusterConfig for platformDomain, read via a cached dynamic client with env fallback. Add hash8 URL helpers and logical hash8.*.domain pattern support without changing the Phase-A public APIs.
Emit one SRR per sharedEntrance with hash8.*.platformDomain and reject duplicate hash8 cluster-wide. Map logical patterns to *.domain hostnames plus Host regular-expression matches on the HTTPRoute.
…l4 routing

Consolidate SRR host reconciliation and L4 routing into app-service: materialize HTTPRoute and NetworkPolicy from the SRR and drive l4-bfl-proxy upstream selection for shared gateway-mode apps.
Wire the ext-authz SecurityPolicy and the app-service authz server backend that authorizes shared requests, pointing the gateway ext-authz filter at app-service.
Make the mesh readiness wait more robust and bootstrap the chart-only network policy path for the window before app-service has reconciled the mesh policy.
Add demo upstream, meshed client, and HTTPRoute samples used to validate gateway routing end to end.
Apply a baseline ext-authz allow for non-v2 hostnames so legacy hosts keep working while v2 shared hosts are enforced.
Permit cluster-scoped apps to use the shared gateway path so they participate in shared routing.
Place the ingress NetworkPolicy and ReferenceGrant in the upstream Service namespace when it differs from the SRR namespace, prevent the security controller from deleting routecontrol-managed NPs, and fan out sharedEntrances into the L4 buildAppInfos so v2 gateway vhosts are generated.
@vercel

vercel Bot commented Jun 8, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
olares Ready Ready Preview, Comment Jun 10, 2026 11:27am
1 Skipped Deployment
Project Deployment Actions Updated (UTC)
olares-docs Ignored Ignored Preview Jun 10, 2026 11:27am

Comment thread cli/cmd/ctl/os/maintain_linkerd_pki.go Outdated
}
if allReady {
return true, "", nil
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mesh ready on one pod

High Severity

egDataPlaneMeshReady returns success as soon as any single active data-plane pod has a ready linkerd-proxy and envoy, without requiring every non-terminating pod to pass the same checks. During a rollout, install can finish while older replicas still lack mesh sidecars and remain in the Service endpoints.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit e3c3517. Configure here.

linkerdVals, err := utils.LoadValuesFile(filepath.Join(vendor, "linkerd-values.yaml"))
if err != nil {
linkerdVals = map[string]interface{}{}
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Helm values parse errors ignored

Medium Severity

When loading Linkerd Helm values files fails for any reason other than a missing file, the install path replaces the error with an empty values map and continues. A corrupt or invalid YAML file can silently install Linkerd with defaults instead of failing fast.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit e3c3517. Configure here.

Comment thread build/package.sh
Comment thread cli/pkg/terminus/tasks.go
if t.Force && appGatewayStackEnabled() {
if err := ValidateAppGatewayInstallerArtifacts(runtime.GetInstallerDir()); err != nil {
return errors.Wrap(err, "Olares installer package incomplete for unified ingress (app-gateway)")
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong installer path validated

Medium Severity

CheckPrepared validates app-gateway artifacts with runtime.GetInstallerDir() only, while install tasks resolve the bundle via resolveInstallerDir, which prefers OLARES_INSTALLER_DIR.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 7b31209. Configure here.

Comment thread cli/pkg/terminus/app_gateway_system.go
Comment thread cli/pkg/terminus/app_gateway_pki_prepare.go Outdated
Comment thread cli/pkg/release/app/app.go
"linkerd": {"linkerd.io/control-plane-component"},
"app-gateway": {"app.kubernetes.io/name=envoy-gateway"},
},
},

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Control plane check wrong namespace

Medium Severity

CheckAppGatewayControlPlaneReady lists Envoy Gateway pods in the hardcoded namespace app-gateway, but installs honor APP_GATEWAY_NAMESPACE via resolveAppGatewayNamespace. A non-default namespace leaves the readiness task polling an empty namespace while the control plane runs elsewhere.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit b111b49. Configure here.

initConfigForNetworkCRDs = utils.InitConfigForAppGateway
loadValuesForNetworkCRDs = utils.LoadValuesFile
applyCRDChartFunc = utils.TemplateAndServerSideApply
linkerdCRDsPresentFunc = linkerdPolicyCRDsPresent

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Linkerd CRD skip too narrow

Medium Severity

ApplyNetworkCRDs treats Linkerd CRDs as present when only policy.linkerd.io/v1alpha1 is registered, then skips applying the bundled linkerd-crds chart. Clusters missing other Linkerd API groups can still fail later during the control-plane install.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 6694d63. Configure here.

Adopt main IsShared discriminator and resolve app-service controller conflicts.

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 6 total unresolved issues (including 5 from previous reviews).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 1064c71. Configure here.

if err != nil || d.Namespace == "" {
return "app-gateway"
}
return d.Namespace

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Namespace helper ignores env override

Medium Severity

Namespace() is documented to honor APP_GATEWAY_NAMESPACE, but it only reads embedded defaults.yaml. The CLI applies the env override via resolveAppGatewayNamespace(), so components using agwconfig.Namespace() can target a different namespace than the installer when the env var is set.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 1064c71. Configure here.

@eball eball marked this pull request as draft June 10, 2026 12:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant