Skip to content

Add monitoring controller for metrics #1959

Open
mayleighnmyers wants to merge 13 commits into
istio-ecosystem:mainfrom
mayleighnmyers:feature/coo-monitoring-controller
Open

Add monitoring controller for metrics #1959
mayleighnmyers wants to merge 13 commits into
istio-ecosystem:mainfrom
mayleighnmyers:feature/coo-monitoring-controller

Conversation

@mayleighnmyers

Copy link
Copy Markdown

Implements a monitoring controller that creates ServiceMonitor and PodMonitor resources for IstioRevision objects. Key features:

  • Uses monitoring.rhobs/v1 API group for COO compatibility
  • Adds 'monitored-by: coo-prometheus' label for COO discovery
  • Creates ServiceMonitor (default-istiod) for istiod metrics
  • Creates PodMonitor (default-proxies) for istio-proxy metrics
  • Sets proper owner references for automatic cleanup

Relates to: OSSM-13585

What type of PR is this?

  • Enhancement / New Feature
  • Bug Fix
  • Refactor
  • Optimization
  • Test
  • Documentation Update

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #

Related Issue/PR #

Additional information:

@linux-foundation-easycla

linux-foundation-easycla Bot commented May 18, 2026

Copy link
Copy Markdown

CLA Signed
The committers listed above are authorized under a signed CLA.

@istio-testing

Copy link
Copy Markdown
Collaborator

Hi @mayleighnmyers. Thanks for your PR.

I'm waiting for a istio-ecosystem or istio member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@nrfox

nrfox commented May 28, 2026

Copy link
Copy Markdown
Contributor

Hi @mayleighnmyers,

How would users control this if they wanted to opt in or out? What will the API be for this? You can either open an issue and we can discuss it there or better yet create a SEP for this. We need to discuss how this will integrate with the operator before we can move forward with this.

@mayleighnmyers

Copy link
Copy Markdown
Author

Hi @mayleighnmyers,

How would users control this if they wanted to opt in or out? What will the API be for this? You can either open an issue and we can discuss it there or better yet create a SEP for this. We need to discuss how this will integrate with the operator before we can move forward with this.

So this is still in progress. Good call for the opt in/out. Something we can consider is adding a field either in the Istio CR or the istioRevision CR to allow users to opt in since right now it is currently always enabled. Yuanlin and I will be talking about this more in some future meeting when we have all ducks in a row. :)

@mayleighnmyers mayleighnmyers changed the title Add monitoring controller for COO (Cluster Observability Operator) [WIP] Add monitoring controller for COO (Cluster Observability Operator) May 28, 2026
1. As a user of Istio's metrics monitoring using Prometheus components, I want to automatically configure Prometheus metrics scraping jobs for Istio generated telemetry metrics.

### API Changes
We will add a boolean field `spec.monitoring.enabled` in the `Istio` and `IstioRevision` CRDs. When a user enables it, a monitoring controller will reconcile a `ServiceMonitor` CR for the mesh control plane and `PodMonitor` CRs for each namespace labeled by the Istio sidecar injection label.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation is missing some of the relabeling rules that exist in the upstream samples. Do we anticipate users needing to customize those and does that change the API?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you are using COO, is this label different for every prom stack: monitored-by: coo-prometheus? Will the API need to include the ability to customize this label?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we anticipate users needing to customize those and does that change the API?

No, we will add a method about appending those relabeling spec as default value in the following commits.

When you are using COO, is this label different for every prom stack: monitored-by: coo-prometheus?

Yes, that label value depends on the prom stack name created by user.

We will add a boolean field `spec.monitoring.enabled` in the `Istio` and `IstioRevision` CRDs. When a user enables it, a monitoring controller will reconcile a `ServiceMonitor` CR for the mesh control plane and `PodMonitor` CRs for each namespace labeled by the Istio sidecar injection label.

### Architecture
We assume the related `ServiceMonitor` and `PodMonitor` CRDs and the API groups such as "monitoring.coreos.com/v1" , "monitoring.rhobs/v1" are available. We will add required RBAC permissions for creating, deleting and updating those two resources in the Sail Operator ClusterRole.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't assume this since the Sail Operator is supported on non-openshift clusters that may not have these APIs installed. Even on OpenShift clusters, how will the operator know whether to use monitoring.coreos.com/v1 or monitoring.rhobs/v1? Will we need to support both?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add validation methods. It will query API server and validate if those two CRDs , APIs are installed.
I suggest using and supporting the monitoring.coreos.com/v1 group. The rhobs is specific from COO operator side and no difference in the two CRDs we use.

1. As a user of Istio's metrics monitoring using Prometheus components, I want to automatically configure Prometheus metrics scraping jobs for Istio generated telemetry metrics.

### API Changes
We will add a boolean field `spec.monitoring.enabled` in the `Istio` and `IstioRevision` CRDs. When a user enables it, a monitoring controller will reconcile a `ServiceMonitor` CR for the mesh control plane and `PodMonitor` CRs for each namespace labeled by the Istio sidecar injection label.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to mention that it's an openshift specific requirement that we create a PodMonitor in each namespace otherwise non-openshift users would benefit from just using the NamspaceSelector on the PodMonitor. This is how the upstream sample is configured.

@nrfox

nrfox commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

The implementation of this broadly looks fine but I think the API will need to be larger than a boolean and will require the sail operator to have additional permissions which may not be relevant for non-openshift users. With ZTunnel and IstioRevisionTag we have targetRef fields that reference the Istio resource. Maybe we could do something similar with this integration.

There could be a standalone Integration resource that has 1. an Istio ref 2. refs to metrics or tracing resources. Something like:

  apiVersion: integration.ossm/v1alpha1
  kind: Integration
  metadata:
    name: my-integration
    namespace: istio-system
  spec:
    istioRef:
      name: default
    metrics:
      type: ClusterObservability
      clusterObservability:
        monitoringStackRef:
          name: my-stack
          namespace: monitoring
    tracing:
      type: TempoStack
      tempoStackRef:
        name: my-tempo
        namespace: tracing

Here istioRef is a reference to an Istio resource, monitoringStackRef is a reference to a MonitoringStack, and tempoStackRef is a reference to a TempoStack. For each of these references the controller would lookup the relevant fields in the reference and configure the Istio accordingly. For example, the controller would use the monitoringStackRef to find the resource label for the PodMonitor/ServiceMonitor resources.

This could even be its own deployment with its own ServiceAccount so that the Sail Operator wouldn't need to be granted additional permissions. Since a lot of the integrations may be OSSM specific, it may even make more sense to only have this available in ossm/sail-operator. Not sure.

Server side apply can ensure that any updates that the integrations controller makes to the Istio resource do not conflict with user supplied values.

The implementation included here would largely stay the same. The main change would be in the API to manage these integrations.

@mayleighnmyers mayleighnmyers changed the title [WIP] Add monitoring controller for COO (Cluster Observability Operator) Add monitoring controller for metrics Jun 15, 2026
@mayleighnmyers mayleighnmyers marked this pull request as ready for review June 15, 2026 17:38
@mayleighnmyers mayleighnmyers requested a review from a team as a code owner June 15, 2026 17:38
@mayleighnmyers mayleighnmyers force-pushed the feature/coo-monitoring-controller branch from 22566a7 to c43fabb Compare June 16, 2026 16:36
@mayleighnmyers mayleighnmyers force-pushed the feature/coo-monitoring-controller branch from c43fabb to 21661f3 Compare June 16, 2026 16:48
@mayleighnmyers mayleighnmyers force-pushed the feature/coo-monitoring-controller branch from 795caee to 3373e36 Compare June 16, 2026 19:22
mayleighnmyers and others added 5 commits June 24, 2026 10:09
Implements a monitoring controller that creates ServiceMonitor and PodMonitor
resources for IstioRevision objects. Key features:

- Uses monitoring.rhobs/v1 API group for COO compatibility
- Adds 'monitored-by: coo-prometheus' label for COO discovery
- Creates ServiceMonitor (default-istiod) for istiod metrics
- Creates PodMonitor (default-proxies) for istio-proxy metrics
- Sets proper owner references for automatic cleanup

Relates to: OSSM-13585

Signed-off-by: Mayleigh Tjapkes <mamyers@redhat.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: Mayleigh Tjapkes <mamyers@redhat.com>
Add ClusterRole permissions for ServiceMonitor and PodMonitor resources
for both monitoring.coreos.com and monitoring.rhobs API groups to allow
the monitoring controller to create and manage these resources.

Signed-off-by: Mayleigh Tjapkes <mamyers@redhat.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: Mayleigh Tjapkes <mamyers@redhat.com>
Signed-off-by: Mayleigh Tjapkes <mamyers@redhat.com>
Signed-off-by: Mayleigh Tjapkes <mamyers@redhat.com>
Signed-off-by: Mayleigh Tjapkes <mamyers@redhat.com>
yxun and others added 7 commits June 24, 2026 10:09
Signed-off-by: Yuanlin Xu <yuanlin.xu@redhat.com>
Signed-off-by: Mayleigh Tjapkes <mamyers@redhat.com>
Signed-off-by: Mayleigh Tjapkes <mamyers@redhat.com>
Signed-off-by: Yuanlin Xu <yuanlin.xu@redhat.com>
Signed-off-by: Yuanlin Xu <yuanlin.xu@redhat.com>
Signed-off-by: Mayleigh Tjapkes <mamyers@redhat.com>
The gencheck job runs make gen, which updates bundle manifests from
chart CRDs. The bundle copy of sailoperator.io_istios.yaml was missing
the new spec.monitoring schema added for the monitoring controller.

Signed-off-by: Mayleigh Tjapkes <mamyers@redhat.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: Mayleigh Tjapkes <mamyers@redhat.com>
Signed-off-by: Mayleigh Tjapkes <mamyers@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants