KEP-5981: Provisional KEP for DRA Sharing Affinity for Conditional Fungibility#5987
KEP-5981: Provisional KEP for DRA Sharing Affinity for Conditional Fungibility#5987ashvindeodhar wants to merge 4 commits intokubernetes:masterfrom
Conversation
|
Welcome @ashvindeodhar! |
|
Hi @ashvindeodhar. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Regular contributors should join the org to skip this step. Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: ashvindeodhar The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
@ashvindeodhar I think I have discussed a similar issue but with a little bit different aspect when introducing the consumable capacity feature. I believe we discussed around the feature where the device dynamically bound to the first allocation. Extensibly, the binding is not limited to how the device will be consumed but, for this case, it seems to bind to a concrete configuration, and only claims with the same configuration being allowed to share (and consume the device’s capacity). How about an alternative approach: instead of relying on opaque's attributes where every resource claim has to set the same. We probably add a field for listing runtime configuration object references. The field at device can be something like commonConfigKind where it lists the Kind of runtime configuration object that must be specified. For example: apiVersion: networking.example.com/v1alpha1
kind: NetworkConfiguration
metadata:
name: subnet-a
spec:
subnet: 10.0.1.0/24kind: ResourceClaim
metadata:
name: claim-net-a
spec:
devices:
config:
objectRefs:
- kind: NetworkConfiguration
name: subnet-akind: ResourceSlice
metadata:
name: net-devices
spec:
devices:
- name: eth1
capacity:
bandwidth: "10Gb"
commonConfigKind:
- NetworkConfiguration |
|
Thanks @sunya-ch ! You're right that the "attributeKeys" naming is confusing — I'll look at renaming to something that makes the distinction from device attributes clearer (e.g., configKeys or parameterKeys). I considered the object-reference approach, but there are a few reasons the well-known schema inside opaque config may work better for this KEP:
Let me know what you think. That said, the naming concern is valid and I'll address it. |
- Resolve placement decision: SharingAffinity stays on ResourceSlice
(driver-side) with rationale for why hardware modal constraints
belong on the device, not the workload
- Resolve claim-value delivery: adopt well-known JSON schema inside
OpaqueDeviceConfiguration per @pohly's guidance; define normative
StructuredParameters contract (recognition, uniqueness, coexistence,
conflict handling, string-only alpha, malformed payloads, missing
entries, validation intent)
- Defer CanSetLock/NeverSetLock to Future Enhancements; alpha allows
any compatible claim to establish the initial lock
- Replace grandfathered-claim model with conservative unknown-affinity
handling: devices with non-reconstructable active claims are filtered
out until fully clean (no optimistic lock-setting over legacy claims)
- Add Safety Model and Responsibility Split section clarifying
scheduler guarantee vs driver guarantee vs conservative fallback
- Introduce AffinityState struct with Unknown flag; replace flat
AffinityValues map with AffinityStates map[DeviceID]AffinityState
- Expand Filter phase to 7-step evaluation including UnknownAffinity
check, exactly-one StructuredParameters entry, schema decode,
string-type enforcement
- Add normative Score ordering (locked-compatible > clean > filtered)
- Add explicit alpha limitations for lock-aware fairness and
preemption blindness throughout Summary, Non-Goals, Proposal,
and Risks sections
- Add string-only matching constraint with rationale to Notes,
Filter, StructuredParameters contract, and new Future Enhancement
(Typed Affinity Values Beyond Strings)
- Add Multi-key SharingAffinity example with subnet+pkey walkthrough
- Expand reconstruction algorithm to handle malformed, non-string,
and duplicate structured-parameters entries
- Harden Risks section: rename Stale Affinity View to Cache Staleness,
add alpha limitation callout to Priority Inversion
- Remove stale SharingAffinityMapping reference
- Add Priority-based Lock Preemption and SharingStrategy to Future
Enhancements with detailed rationale
- Update Graduation Criteria, Upgrade/Downgrade, Version Skew,
PRR sections for conservative unknown-affinity handling
|
@ashvindeodhar Thank you for explanation. I understand. Still, could you please add the above choice in alternative section for documentation purpose? In terms of naming, I plus one for |
- Add compatibility matrix (5 scenarios × Scheduler/Driver Outcome columns)
showing dual-enforcement model for SA+SP combinations
- Add Enablement and Rollout Dynamics section with unknown-affinity safety
valve, missing/malformed parameter handling, version skew/rollback, and
recommended rollout sequence for drivers
- Add Object Reference-based Affinity Matching as a rejected alternative
with rationale (API surface, multi-dimensional affinity, @pohly direction)
- Add drawback: devices with sharingAffinity but no SP claims become
unschedulable under Strict Gating
Key technical contributions in this KEP include:
This design transitions DRA from Quantitative Sharing (how many slots?) to Qualitative Gating (what mode are those slots in?), which is essential for multi-tenant AI and HPC workloads.