Skip to content
Open
6 changes: 6 additions & 0 deletions cmd/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -146,6 +146,8 @@ type options struct {
remoteUpdatesEnabled bool
datadogDashboardEnabled bool
datadogGenericResourceEnabled bool
datadogGenericResourceMaxWorkers int
datadogGenericResourceRequeuePeriod time.Duration
datadogCSIDriverEnabled bool
untaintControllerEnabled bool
untaintControllerWaitForCSIDriver bool
Expand Down Expand Up @@ -187,6 +189,8 @@ func (opts *options) Parse() {
flag.BoolVar(&opts.remoteUpdatesEnabled, "remoteUpdatesEnabled", false, "Enable Remote Updates capabilities in the Operator (beta)")
flag.BoolVar(&opts.datadogDashboardEnabled, "datadogDashboardEnabled", false, "Enable the DatadogDashboard controller")
flag.BoolVar(&opts.datadogGenericResourceEnabled, "datadogGenericResourceEnabled", false, "Enable the DatadogGenericResource controller")
flag.IntVar(&opts.datadogGenericResourceMaxWorkers, "datadogGenericResourceMaxConcurrentReconciles", 1, "Maximum number of concurrent DatadogGenericResource reconciles")
flag.DurationVar(&opts.datadogGenericResourceRequeuePeriod, "datadogGenericResourceRequeuePeriod", 0, "DatadogGenericResource status polling requeue period, for example 5m. If unset, DD_GENERIC_RESOURCE_REQUEUE_PERIOD or 60s is used.")
flag.BoolVar(&opts.datadogCSIDriverEnabled, "datadogCSIDriverEnabled", false, "Enable the DatadogCSIDriver controller")
flag.BoolVar(&opts.untaintControllerEnabled, "untaintControllerEnabled", false, "Enable the Untaint controller")
flag.BoolVar(&opts.untaintControllerWaitForCSIDriver, "untaintControllerWaitForCSIDriver", false,
Expand Down Expand Up @@ -386,6 +390,8 @@ func run(opts *options) error {
DatadogAgentProfileEnabled: opts.datadogAgentProfileEnabled,
DatadogDashboardEnabled: opts.datadogDashboardEnabled,
DatadogGenericResourceEnabled: opts.datadogGenericResourceEnabled,
DatadogGenericResourceMaxWorkers: opts.datadogGenericResourceMaxWorkers,
DatadogGenericResourceRequeue: opts.datadogGenericResourceRequeuePeriod,
DatadogCSIDriverEnabled: opts.datadogCSIDriverEnabled,
UntaintControllerEnabled: opts.untaintControllerEnabled,
UntaintControllerWaitForCSIDriver: opts.untaintControllerWaitForCSIDriver,
Expand Down
17 changes: 15 additions & 2 deletions docs/datadog_generic_resource.md
Original file line number Diff line number Diff line change
Expand Up @@ -162,8 +162,19 @@ To deploy a `DatadogGenericResource` with the Datadog Operator, follow the steps

Further example manifests are provided [in the supported resources table](#supported-resources).

By default, the Operator ensures that the API resource definition stays in sync with the `DatadogGenericResource` every **60** minutes (per resource). This interval can be adjusted using the environment variable `DD_GENERIC_RESOURCE_FORCE_SYNC_PERIOD`, which specifies the number of minutes. For example, setting this variable to `"30"` changes the interval to 30 minutes.
## Controller tuning

The `DatadogGenericResource` controller exposes several tuning options for large installations or load tests.

| Option | Default | Description |
| --- | --- | --- |
| `DD_GENERIC_RESOURCE_FORCE_SYNC_PERIOD` | `60` minutes | Interval, in minutes, for checking that the Datadog API resource definition still matches the Kubernetes `DatadogGenericResource`. For example, `"30"` changes the interval to 30 minutes. |
| `DD_GENERIC_RESOURCE_REQUEUE_PERIOD` | `60s` | Scheduled requeue interval for each `DatadogGenericResource` after a successful reconcile. On idle requeues, the controller also polls Datadog-side live state for resource types that expose it, currently `monitor` and `slo`. Accepts Go duration strings such as `30s` or `5m`, or a plain integer interpreted as seconds. The minimum value is `1s`. This can also be set with the `--datadogGenericResourceRequeuePeriod` manager flag. |
| `--datadogGenericResourceMaxConcurrentReconciles` | `1` | Maximum number of `DatadogGenericResource` objects that the controller reconciles at the same time. |

Increasing `--datadogGenericResourceMaxConcurrentReconciles` can improve throughput when creating, updating, deleting, or periodically syncing many resources. The tradeoff is higher Operator CPU usage and more concurrent requests to the Datadog API. Setting this value too high can increase the likelihood of Datadog API rate limits, especially when many resources reconcile at once or when the requeue interval is short.

Lowering `DD_GENERIC_RESOURCE_REQUEUE_PERIOD` makes all `DatadogGenericResource` objects reconcile more often. For `monitor` and `slo` resources, it also keeps `.status.state` fresher. The tradeoff is more Operator work and, for requeues that call the Datadog API, more API traffic. Raising the interval reduces polling overhead at the cost of slower periodic reconciliation and less frequent state updates.

## Datadog-side status

Expand All @@ -183,7 +194,9 @@ kubectl get datadoggenericresource # shows state and last state sync columns
kubectl wait --for=condition=StateSynced datadoggenericresource/<name>
```

The controller refreshes `state` roughly every 60 seconds during reconciliation. Failures are visible only via the `StateSynced` condition — they do not break the reconcile loop and the last-known `state` is retained until a subsequent refresh succeeds.
The controller requeues every `DatadogGenericResource` roughly every 60 seconds by default. This interval is controlled by `DD_GENERIC_RESOURCE_REQUEUE_PERIOD` or the `--datadogGenericResourceRequeuePeriod` manager flag. For `monitor` and `slo` resources, these idle requeues refresh `state`. For resource types without live state, the state fields remain empty. Status polling requeues have lower priority than normal create, update, and delete work, so Datadog-side state updates may be delayed when the controller queue is busy. This keeps management operations ahead of background state polling, but means `.status.state` is eventually consistent rather than immediate.

Failures are visible only through the `StateSynced` condition. They do not break the reconcile loop and the last-known `state` is retained until a subsequent refresh succeeds.

This information is currently surfaced for `monitor` and `slo` resources. Resource types that do not expose live Datadog-side state (e.g., `dashboard`, `notebook`) leave these fields empty.

Expand Down
Loading
Loading