WIP: feat: proposal for CEL expression placement plugin#737
WIP: feat: proposal for CEL expression placement plugin#737prb112 wants to merge 3 commits intooutrigger-project:mainfrom
Conversation
Signed-off-by: Paul Bastide <pbastide@us.ibm.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: prb112 The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
|
||
| The current Multiarch Tuning Operator automatically determines Pod image architecture compatibility by inspecting container images. While this works well for most scenarios, there are cases where administrators need more control at the namespace level: | ||
|
|
||
| 1. **Workload-specific architecture preferences** Some workloads may benefit from specific architectures based on their component affinnity (e.g., database pods on ppc64le, web servers on amd64) |
There was a problem hiding this comment.
a typo
| 1. **Workload-specific architecture preferences** Some workloads may benefit from specific architectures based on their component affinnity (e.g., database pods on ppc64le, web servers on amd64) | |
| 1. **Workload-specific architecture preferences** Some workloads may benefit from specific architectures based on their component affinity (e.g., database pods on ppc64le, web servers on amd64) |
| - Filters configs whose `labelSelector` matches the pod | ||
| - For matching configs with `celArchitecturePlacement` enabled, evaluates CEL expressions in priority order | ||
| - If a rule matches (CEL expression returns `true`): | ||
| - **Removes any existing architecture constraints** from the pod's `spec.nodeSelector` (removes `kubernetes.io/arch` key if present) |
There was a problem hiding this comment.
If a user creates a namespaced PPC and enables celArchitecturePlacement, it can override all of spec.nodeSelector and spec.affinity.nodeAffinity, including user-defined settings, not just those added by MTO?
There was a problem hiding this comment.
Ah, I see the answer in the following section
|
|
||
| 1. **Limits Rule Explosion and Configuration Burden** Without a default, administrators would need to create rules for every possible pod pattern, leading to complex and hard-to-maintain configurations. The plugin supports _exceptional_ cases in the same namespace. | ||
|
|
||
| 2. **Provides Fallback Behavior** When no rules match, the system needs a sensible default rather than failing or using arbitrary behavior |
There was a problem hiding this comment.
we now have a field named fallbackArchitecture in Cluster scope ppc https://github.com/outrigger-project/multiarch-tuning-operator/blob/main/api/v1beta1/clusterpodplacementconfig_types.go#L58, If both fallbackArchitecture and defaultArchitectures are set, does defaultArchitectures take precedence?
| metadata: | ||
| name: database-rules | ||
| namespace: production | ||
| spec: |
There was a problem hiding this comment.
This is missing the Priority field which will default to 0 if left empty.
|
|
||
| #### Priority and Conflict Resolution | ||
|
|
||
| Only a single `PodPlacementConfig` resource in the same namespace is allowed. |
There was a problem hiding this comment.
I feel like this disregards the purpose of the Priority field. We would want multiple PPC with different priories. If several configs match the pod via labelSelector, is the winner only by priority, or is there a tie-breaker (name, creation time)?
| 1. **CEL Compilation** Expressions are compiled once at configuration time | ||
| 2. **Expression Caching** Compiled expressions are cached to avoid repeated compilation | ||
| 3. **Evaluation Overhead** CEL evaluation is fast (microseconds per expression) | ||
| 4. **Rule Limit** Maximum of 500 rules per configuration to prevent excessive evaluation time. This limit will be reviewed after use. |
There was a problem hiding this comment.
There are conflicting numbers for the rule limit. Is it 50, 500. or 1000+?
| | Misconfigured rules assign pods to incompatible architectures | Pods fail to start with ENOEXEC errors | Document best practices; recommend testing rules in non-production environments; existing ENOEXEC monitoring will detect issues | | ||
| | Too many rules impact performance | Increased pod scheduling latency | Limit maximum rules per configuration (500); compile and cache expressions; provide performance guidelines | | ||
| | Conflicting rules between multiple PodPlacementConfigs | Unpredictable behavior | Clear precedence rules based on priority field; document evaluation order | | ||
| | CEL expressions access sensitive pod data | Potential information disclosure | CEL expressions only have access to pod metadata (labels, annotations, name, namespace); no access to secrets or container specs | |
There was a problem hiding this comment.
The risk table says CEL only sees pod metadata and not container specs. Elsewhere the design is “evaluate against a Pod resource” with self. If the real CEL type is a full Pod, expressions could reference images, env, volumes, etc., unless the implementation uses a restricted type or field mask.
| - key: kubernetes.io/arch | ||
| operator: In | ||
| values: | ||
| - ppc64le | ||
| - amd64 |
There was a problem hiding this comment.
| - key: kubernetes.io/arch | |
| operator: In | |
| values: | |
| - ppc64le | |
| - amd64 | |
| - key: kubernetes.io/arch | |
| operator: In | |
| values: | |
| - ppc64le | |
| - amd64 |
| - If no rules match, uses the default architecture list specified in the plugin configuration (also removing existing constraints) | ||
| - The scheduling gate is removed | ||
|
|
||
| #### Architecture Constraint Removal |
There was a problem hiding this comment.
The proposal states that celArchitecturePlacement removes existing arch constraints and sets required affinity, that image-based detection can still apply elsewhere, and that NodeAffinityScoring can coexist and “prefer among eligible architectures.” Could you document an explicit ordering (and which component runs in which stage) for a single reconcile pass? Without that, it is ambiguous whether image-based logic might run before or after CEL and whether scoring sees the final required arch set or an intermediate state, which makes behavior hard to predict and tests hard to specify.
|
|
||
| | Risk | Impact | Mitigation | | ||
| |------|--------|------------| | ||
| | Complex CEL expressions cause evaluation errors | Pods may not be scheduled correctly | Validate expressions at admission time; provide clear error messages; treat evaluation errors as non-matches | |
There was a problem hiding this comment.
Implementation notes say evaluation errors are logged and treated as false. How do we handle validation on PodPlacementConfig create/update (reject bad CEL), and whether any pod-level admission failure is possible, or failures are always soft (fall through to default / next rule)?
Signed-off-by: Paul Bastide <pbastide@us.ibm.com>
Signed-off-by: Paul Bastide <pbastide@us.ibm.com>
No description provided.