diff --git a/keps/sig-scheduling/5981-dra-sharing-affinity/README.md b/keps/sig-scheduling/5981-dra-sharing-affinity/README.md
new file mode 100644
index 000000000000..485f42fa386e
--- /dev/null
+++ b/keps/sig-scheduling/5981-dra-sharing-affinity/README.md
@@ -0,0 +1,1675 @@
+# KEP-5981: DRA Sharing Affinity for Conditional Fungibility
+
+<!-- toc -->
+- [Release Signoff Checklist](#release-signoff-checklist)
+- [Summary](#summary)
+- [Motivation](#motivation)
+  - [Goals](#goals)
+  - [Non-Goals](#non-goals)
+- [Proposal](#proposal)
+  - [User Stories](#user-stories)
+    - [Story 1: RDMA Partition Key Alignment](#story-1-rdma-partition-key-alignment)
+    - [Story 2: FPGA Bitstream Sharing](#story-2-fpga-bitstream-sharing)
+    - [Story 3: Single-subnet NIC Sharing](#story-3-single-subnet-nic-sharing)
+  - [Notes/Constraints/Caveats](#notesconstraintscaveats)
+  - [Risks and Mitigations](#risks-and-mitigations)
+- [Design Details](#design-details)
+  - [API Enhancement](#api-enhancement)
+    - [ResourceSlice Device Spec](#resourceslice-device-spec)
+    - [Scheduler Enhancement](#scheduler-enhancement)
+  - [Examples](#examples)
+    - [ResourceSlice with Sharing Affinity](#resourceslice-with-sharing-affinity)
+    - [ResourceClaim with Affinity Value](#resourceclaim-with-affinity-value)
+    - [Multi-key SharingAffinity Example](#multi-key-sharingaffinity-example)
+  - [Test Plan](#test-plan)
+      - [Prerequisite testing updates](#prerequisite-testing-updates)
+      - [Unit tests](#unit-tests)
+      - [Integration tests](#integration-tests)
+      - [e2e tests](#e2e-tests)
+  - [Graduation Criteria](#graduation-criteria)
+    - [Alpha](#alpha)
+    - [Beta](#beta)
+    - [GA](#ga)
+  - [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
+  - [Version Skew Strategy](#version-skew-strategy)
+- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
+  - [Feature Enablement and Rollback](#feature-enablement-and-rollback)
+  - [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning)
+  - [Monitoring Requirements](#monitoring-requirements)
+  - [Dependencies](#dependencies)
+  - [Scalability](#scalability)
+  - [Troubleshooting](#troubleshooting)
+- [Implementation History](#implementation-history)
+- [Drawbacks](#drawbacks)
+- [Alternatives](#alternatives)
+  - [Claim-side SharingAffinity (on DeviceRequest)](#claim-side-sharingaffinity-on-devicerequest)
+  - [Object Reference-based Affinity Matching](#object-reference-based-affinity-matching)
+  - [Placeholder Pattern Workaround](#placeholder-pattern-workaround)
+  - [CEL-based Affinity Matching](#cel-based-affinity-matching)
+- [Future Enhancements](#future-enhancements)
+- [Infrastructure Needed](#infrastructure-needed)
+<!-- /toc -->
+
+## Release Signoff Checklist
+
+Items marked with (R) are required *prior to targeting to a milestone / release*.
+
+- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
+- [ ] (R) KEP approvers have approved the KEP status as `implementable`
+- [ ] (R) Design details are appropriately documented
+- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
+  - [ ] e2e Tests for all Beta API Operations (endpoints)
+  - [ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
+  - [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free
+- [ ] (R) Graduation criteria is in place
+  - [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
+- [ ] (R) Production readiness review completed
+- [ ] (R) Production readiness review approved
+- [ ] "Implementation History" section is up-to-date for milestone
+- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
+- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
+
+[kubernetes.io]: https://kubernetes.io/
+[kubernetes/enhancements]: https://git.k8s.io/enhancements
+[kubernetes/kubernetes]: https://git.k8s.io/kubernetes
+[kubernetes/website]: https://git.k8s.io/website
+
+## Summary
+
+This KEP proposes an extension to Dynamic Resource Allocation (DRA) that allows
+the `kube-scheduler` to handle resources that are **conditionally fungible**.
+
+[KEP-5075 (Consumable Capacity)](https://github.com/kubernetes/enhancements/issues/5075)
+introduced the ability to track numerical capacity (e.g., 16 slots of a NIC)
+and share devices across multiple claims via `allowMultipleAllocations`.
+However, it assumes all claims are fungible—any claim can share the device with
+any other claim.
+
+Real-world hardware is often **modal**: once partially allocated, the device
+requires all subsequent consumers to share a specific configuration. For
+example:
+
+- **Multi-pod NIC sharing**: A network DRA driver shares a NIC across 16 pods,
+  but all pods must belong to the same subnet. Once the first pod configures the
+  NIC for Subnet A, the remaining 15 slots are restricted to Subnet A.
+- **FPGA bitstream sharing**: An FPGA can serve multiple inference pods, but all
+  must use the same bitstream. Once bitstream-ml-v2 is loaded, other pods
+  needing bitstream-crypto-v1 must use a different FPGA.
+
+This KEP introduces a `SharingAffinity` field in the ResourceSlice `Device`
+spec that allows drivers to declare which parameter keys constrain
+sharing compatibility. The scheduler's `AllocatedState` is enhanced to track
+both consumed capacity and the affinity values that lock a device to a
+particular sharing group, enabling it to gate remaining capacity and
+pack compatible workloads onto already-locked devices.
+
+Alpha intentionally does not provide lock-breaking
+preemption. In addition, if a device already has active allocations whose
+affinity cannot be reconstructed (for example, legacy claims created before the
+feature was enabled), the scheduler treats that device conservatively and does
+not place new `sharingAffinity` allocations on it until the device becomes
+clean.
+
+`sharingAffinity` in this KEP refers specifically to compatibility for
+co-allocation on a shared device; it is distinct from pod affinity,
+anti-affinity, or topology-aware placement.
+
+## Motivation
+
+As AI and HPC workloads move toward higher density, hardware partitioning
+(SR-IOV, GPU slicing, FPGA multi-tenancy) is becoming standard. These
+physical devices often have a "modal" constraint: once partially allocated,
+the device requires all subsequent consumers to share a specific configuration
+(see [Summary](#summary) for concrete examples).
+
+Currently, the scheduler is unaware of this "lock." It may schedule a Pod
+requiring a different configuration to the same device because it sees
+"available capacity."
+In short: **In these scenarios, Quantitative Sharing (how many slots?) fails
+without Qualitative Gating (what mode are those slots in?).** This leads to:
+
+1. **Allocation failures at the node level**: The driver rejects incompatible
+   binds at prepare time, after the scheduler has already committed
+2. **High scheduling latency**: The scheduler retries the same failing
+   combination, thrashing between candidates
+3. **Resource starvation**: Without affinity awareness, same-subnet pods
+   spread across multiple devices instead of consolidating—wasting capacity
+4. **Complex driver workarounds**: Drivers resort to placeholder patterns
+   with race conditions and ResourceSlice churn
+
+The scheduler's `AllocatedState` currently tracks consumed capacity but not the
+affinity values that determine sharing compatibility. This KEP closes that gap.
+
+### Goals
+
+- Enable the scheduler to gate remaining capacity on a device based on a
+  required sharing attribute
+- Provide a mechanism for drivers to signal compatibility requirements for
+  shared hardware via `SharingAffinity` in ResourceSlice
+- Minimize fragmentation of cluster resources by enabling the scheduler to
+  pack workloads with identical sharing requirements onto already-locked devices
+- Track affinity values in `AllocatedState` so subsequent scheduling decisions
+  respect the first claim's lock-in
+- Maintain backward compatibility with devices that have no sharing affinity
+  constraints
+
+### Non-Goals
+
+- Defining hardware-specific attribute names (these remain driver-defined)
+- Managing the physical lifecycle of the device configuration (this remains
+  the driver's responsibility)
+- Changing how capacity is tracked (that's KEP-5075)
+- Supporting affinity across different device types or pools
+- Retrofitting affinity-aware sharing onto already-in-use devices when active
+  claims do not expose reconstructable affinity values. In alpha, such devices
+  are treated conservatively until they drain clean.
+- Guaranteeing **lock-breaking preemption** in alpha.
+  Alpha enforces compatibility and improves packing, but does not yet guarantee
+  that a higher-priority Pod can displace an incompatible lock-holder.
+
+## Proposal
+
+Add a `sharingAffinity` field to `Device` in ResourceSlice that specifies which device attribute keys constrain sharing:
+
+```yaml
+apiVersion: resource.k8s.io/v1
+kind: ResourceSlice
+spec:
+  devices:
+    - name: eth1
+      allowMultipleAllocations: true
+      sharingAffinity:
+        parameterKeys: ["networking.example.com/subnet"]
+      capacity:
+        networking.example.com/slots:
+          value: "16"
+```
+
+When the scheduler allocates a multi-allocatable device with `sharingAffinity`:
+
+1. **First claim**: The scheduler decodes the claim's well-known structured
+   parameters from opaque config, reads the affinity values for the specified
+   attribute key(s), and records them in `AllocatedState` alongside consumed
+   capacity
+2. **Subsequent claims**: The scheduler checks if the new claim's affinity values match those recorded in `AllocatedState`
+3. **Mismatch**: If values don't match, the device is skipped (try another device)
+4. **Match**: If values match and capacity is available, allocation proceeds
+
+**Alpha Design Decisions**
+
+**1. Placement of SharingAffinity: ResourceSlice (driver-side)**
+
+This KEP places `SharingAffinity` on the ResourceSlice `Device` (driver-
+defined). We chose driver-side placement because the hardware modal constraint
+is a property of the device, not the workload. The driver knows that "once a
+NIC is configured for subnet A, it can only serve subnet A"—this is a
+device-level constraint that should be declared once on the device.
+
+An alternative design places `SharingAffinity` on the `DeviceRequest` in the
+`ResourceClaim` (user-defined). See [Alternatives: Claim-side
+SharingAffinity](#claim-side-sharingaffinity-on-devicerequest) for the
+trade-off analysis.
+
+**2. How claims communicate affinity values to the scheduler**
+
+The driver declares `sharingAffinity.parameterKeys` on the device, telling the
+scheduler which attribute keys constrain sharing. The scheduler learns the
+requested values for those keys by decoding a well-known JSON schema stored
+inside `OpaqueDeviceConfiguration`.
+
+The claim's opaque config (`DeviceConfiguration.Opaque.Parameters`) is a
+`runtime.RawExtension`—raw bytes the scheduler cannot parse generically. For
+this feature, drivers that want sharing affinity encode scheduler-readable
+parameters using a community-governed JSON schema, similar to the pattern in
+[k8s.io/dynamic-resource-allocation/api/metadata](https://github.com/kubernetes/kubernetes/tree/master/staging/src/k8s.io/dynamic-resource-allocation/api/metadata).
+This avoids any API changes to `DeviceConfiguration` while giving the scheduler
+a decodable format for affinity-relevant parameters.
+
+**Approach: Well-known JSON schema inside OpaqueDeviceConfiguration**
+
+The schema reuses the same qualified key naming convention as `ResourceSlice`
+attributes and follows a `DeviceAttribute`-like envelope. In alpha,
+`sharingAffinity` matching is limited to string-valued attributes for the
+keys referenced by `sharingAffinity.parameterKeys`, which keeps equality
+semantics simple and aligns with the scheduler's in-memory lock representation.
+
+```json
+{
+  "apiVersion": "resource.k8s.io/v1alpha1",
+  "kind": "StructuredParameters",
+  "attributes": {
+    "networking.example.com/subnet": {"string": "subnet-X"},
+    "networking.example.com/pkey": {"string": "0x8001"}
+  }
+}
+```
+
+The scheduler decodes this JSON from the opaque blob and extracts string
+values for the keys listed in the device's `sharingAffinity.parameterKeys`. The
+decoding overhead is small compared to the overall scheduling effort.
+
+If drivers also need additional, differently structured configuration
+parameters (e.g., MTU, QoS settings), users provide two config entries in
+the claim: one using the standard schema (scheduler reads) and one using the
+vendor format (driver reads). The scheduler only considers configurations
+matching the well-known schema.
+
+**Key advantages:**
+- **No API changes** to `DeviceConfiguration` — the feature uses existing
+  opaque config with a well-known schema
+- **No duplication** for simple cases — the driver can read the same structured
+  parameters it programs (e.g., subnet, PKey)
+- **Extensible** — the well-known schema can support future scheduler-readable
+  hints beyond sharing affinity
+
+**Alpha StructuredParameters Contract**
+
+For alpha, the scheduler-readable structured-parameters format is a
+scheduler-recognized contract with the following rules:
+
+1. **Recognition**: The scheduler recognizes a config entry as structured
+   parameters only when the `opaque.driver` is `resource.k8s.io` and the
+   embedded payload has `apiVersion: resource.k8s.io/v1alpha1` and
+   `kind: StructuredParameters`.
+2. **Per-request uniqueness**: For a given request, there must be at most
+   one structured-parameters config entry targeted at that request. Multiple
+   matching entries for the same request are invalid.
+3. **Coexistence with driver config**: The structured-parameters entry may
+   coexist with one or more driver-specific opaque config entries for the same
+   request. The scheduler reads only the recognized structured-parameters
+   entry. The driver-specific entries are ignored by the scheduler.
+4. **Conflict handling**: If the same logical setting is encoded both in
+   `StructuredParameters` and in driver-specific config, the scheduler uses only
+   the `StructuredParameters` value for placement decisions and does not attempt to
+   compare or reconcile the driver-specific opaque payload. If a conflict exists,
+   the driver should reject the request during `NodePrepareResources` with a clear error,
+   rather than silently accepting divergent values.
+5. **String-only affinity values in alpha**: For any key referenced by
+   `sharingAffinity.parameterKeys`, the recognized structured-parameters entry
+   must provide a `string` value in alpha. Other value types are not matched in
+   alpha and are treated as invalid for `sharingAffinity` scheduling.
+6. **Malformed payloads**: If a recognized structured-parameters entry is
+   malformed, has the wrong schema, or cannot be decoded, it is treated as
+   invalid for scheduling purposes.
+7. **Missing recognized entry**: If a claim targets a device with
+   `sharingAffinity` but does not provide a recognized structured-parameters
+   entry for that request, the device is filtered out. This does not make the
+   claim universally unschedulable. It only makes the claim ineligible for devices
+   that declare `sharingAffinity`. If all feasible devices for the request declare
+   `sharingAffinity`, then the request may remain unschedulable until a recognized
+   `StructuredParameters` entry is provided or non-sharing-affinity capacity is available.
+8. **Validation intent**: API validation should reject malformed or duplicate
+   structured-parameters entries when feasible. The scheduler must still
+   handle invalid persisted objects defensively and deterministically.
+
+In alpha, this keeps the contract explicit without introducing a new API field:
+the scheduler depends only on a single, community-governed, recognized payload
+shape and ignores all other opaque config.
+
+For alpha, `StructuredParameters` is a scheduler-recognized sub-protocol
+defined by this KEP. The scheduler interprets only payloads explicitly
+recognized as `opaque.driver: resource.k8s.io` together with the embedded
+`apiVersion`/`kind` for `StructuredParameters`; all vendor-defined opaque
+payloads remain opaque. The sub-protocol is versioned via the embedded
+`apiVersion` and future revisions must define compatibility and upgrade
+behavior explicitly.
+
+**Alpha scope**
+
+Alpha fully resolves the design around driver-side placement and the
+structured-parameters approach described above. Claims do not control
+lock-setting behavior in alpha: any compatible claim may establish the initial
+lock on a clean device. Claim-side lock-setting policy (for example,
+`CanSetLock`/`NeverSetLock`) is deferred to [Future
+Enhancements](#future-enhancements).
+
+In other words, alpha standardizes driver-declared compatibility keys, a
+scheduler-recognized structured-parameters contract, and correct lock
+enforcement / packing behavior — but intentionally stops short of
+lock-breaking preemption semantics.
+
+**Alpha limitations**
+
+Alpha provides correct lock enforcement and better packing, but it does
+not provide lock-breaking preemption. A lower-priority
+Pod may continue holding a device lock even when the device still has nominal
+capacity and a higher-priority Pod needs the same device with a different
+affinity value. In that case the higher-priority Pod may remain unschedulable
+until a compatible alternative appears or the lock-holder exits. This is an
+expected alpha limitation, not a correctness bug, and is addressed later under
+[Future Enhancements: Priority-based Lock Preemption](#priority-based-lock-preemption).
+
+### User Stories
+
+#### Story 1: RDMA Partition Key Alignment
+
+A user runs a distributed training job where every Pod must share the same
+RDMA Partition Key (PKey) to communicate. The NIC supports 16 VFs. The driver
+sets `sharingAffinity.parameterKeys: ["networking.example.com/pkey"]`. The scheduler finds a node where
+a NIC has enough capacity and is either "unlocked" or already locked to that
+specific PKey.
+
+- Pod A (pkey-0x8001) gets allocated to mlx5_0 → mlx5_0 is now locked to pkey-0x8001
+- Pod B (pkey-0x8001) arrives → matches affinity, shares mlx5_0
+- Pod C (pkey-0x8002) arrives → affinity mismatch, gets mlx5_1 instead
+
+#### Story 2: FPGA Bitstream Sharing
+
+An inference service uses FPGAs to accelerate a specific model. Loading a
+bitstream takes several seconds. The driver sets
+`sharingAffinity.parameterKeys: ["fpga.example.com/bitstream"]`. The scheduler ensures new Pods
+for this model are scheduled onto FPGAs that already have the bitstream loaded,
+even if other "fresh" FPGAs are available.
+
+- Pod A (bitstream-ml-v2) gets the FPGA → locks to bitstream-ml-v2
+- Pod B (bitstream-ml-v2) shares the same FPGA
+- Pod C (bitstream-crypto-v1) must wait or use a different FPGA
+
+#### Story 3: Single-subnet NIC Sharing
+
+A network DRA driver advertises NICs that can be shared across up to 16 pods,
+but only if pods belong to the same subnet. The driver sets
+`sharingAffinity.parameterKeys: ["networking.example.com/subnet"]`.
+
+- Pod A (subnet-X) gets allocated to eth1 → eth1 is now locked to subnet-X
+- Pod B (subnet-X) arrives → matches affinity, shares eth1
+- Pod C (subnet-Y) arrives → affinity mismatch, gets eth2 instead
+
+### Notes/Constraints/Caveats
+
+- **Affinity is set by the first compatible claim on a clean device**: Once a
+  device is allocated with an affinity value, that value is locked until all
+  claims release the device.
+- **Attribute keys must be declared**: The device's
+  `sharingAffinity.parameterKeys` lists which attribute keys constrain sharing;
+  claims must provide values for all of these keys in the well-known structured
+  parameters or the device is filtered out.
+- **Multiple keys**: If multiple attribute keys are specified, ALL must match
+  (both presence and value).
+- **Extra keys in claim**: If a claim's structured parameters contain keys beyond
+  what the device declares in `parameterKeys`, the extra keys are ignored
+  for that device. Only the device's declared keys are evaluated. This allows
+  "generic" claims to work across devices with different sharing requirements
+  (e.g., a claim with both `subnet` and `vlan` can match a device that only
+  constrains on `subnet`).
+- **String-only matching in alpha**: For keys referenced by
+  `sharingAffinity.parameterKeys`, the scheduler only matches `string` values
+  in alpha. If a required key is present with a non-string value, the device is
+  filtered out for that claim.
+- **Missing keys in claim**: If the claim does not provide a value for a key
+  the device declares in `parameterKeys`, the device is filtered out (see
+  Filter phase).
+- **Malformed structured parameters**: If the scheduler-recognized
+  `StructuredParameters` entry is malformed, undecodable, or uses the wrong
+  schema, it is treated as invalid and the claim cannot use devices that rely
+  on `sharingAffinity`.
+- **Duplicate structured parameters for one request**: If more than one
+  recognized `StructuredParameters` config entry targets the same request, the
+  claim is treated as invalid for `sharingAffinity` scheduling until corrected.
+- **Multi-request claims (per-request scoping)**: If a claim requests multiple
+  devices (e.g., `mgmt-nic` and `data-nic`), each `DeviceClaimConfiguration`
+  block targets specific requests via its `requests` slice. Different config
+  blocks can specify different structured parameters for different requests. This
+  means `mgmt-nic` can be locked to Subnet-A while `data-nic` is locked to
+  Subnet-B within the same claim — there is no cross-talk between requests.
+- **Empty affinity**: Devices without `sharingAffinity` behave as before — any
+  claim can share them regardless of whether it provides structured parameters.
+- **Legacy allocations with unknown affinity are conservative in alpha**:
+  If a device has active allocations for which the scheduler cannot reconstruct
+  the required affinity values (for example, claims created before the feature
+  was enabled or invalid persisted claims), that device is treated as having
+  unknown affinity state and is filtered out for new `sharingAffinity`
+  scheduling until it becomes fully clean.
+
+#### Handling Legacy Claims with Unknown Affinity
+
+| Device State | New Claim | Result |
+|---|---|---|
+| 5 legacy claims, affinity unknown | Claim with `subnet: A` | **Filtered out**. Existing allocations have unknown affinity, so no new `sharingAffinity` lock may be established yet. |
+| 5 legacy claims, affinity unknown | Claim without structured parameters | **Filtered out**. Missing required scheduler-readable affinity information. |
+| Legacy claims drained; device now clean | Claim with `subnet: A` | Lock set to `subnet: A`; device now locked. |
+| Device locked to `subnet: A` | Claim with `subnet: A` | Allowed (values match). |
+| Device locked to `subnet: A` | Claim with `subnet: B` | **Rejected** (mismatch with lock). |
+| All claims released | — | Device fully clean and eligible to establish a new lock. |
+
+Legacy claims continue to run and are not evicted. However, until all unknown
+allocations on a `sharingAffinity` device are released, the scheduler does not
+assume it knows the device's effective modal state.
+
+#### Compatibility Matrix
+
+To clarify the interaction between claims and devices, the following matrix
+outlines how the scheduler and driver evaluate candidates based on whether
+`SharingAffinity` (SA) is declared on the device and whether
+`StructuredParameters` (SP) are provided in the claim:
+
+| Scenario | Device SA | Claim SP | Scheduler Outcome | Driver Outcome |
+|---|---|---|---|---|
+| **Standard Feature Use** | Yes | Yes | **Match enforced.** Values match lock + capacity available → scheduled. | **Validates** hardware mode matches claim config at `NodePrepareResources`. Rejects if stale or inconsistent. |
+| **Strict Gating** | Yes | No | **Filtered out.** Device excluded — requires affinity signal the claim does not provide. | **N/A** — claim never reaches the driver for this device. |
+| **Legacy Device Transition** | Yes (newly added) | Yes | **Filtered out** while legacy claims are active (`Unknown: true`). Allowed once device drains clean. | **Validates** as normal once claim reaches the driver. During transition, driver continues serving legacy claims. |
+| **Permissive Sharing** | No | Yes | **Allowed.** Device has no `sharingAffinity`; structured parameters are not evaluated for affinity. Standard capacity matching applies. | **Must enforce** hardware compatibility independently. Scheduler provides no affinity gating for this device. |
+| **Legacy/Basic** | No | No | **Allowed.** Standard DRA capacity and attribute matching. | **Must enforce** hardware compatibility independently. This is the pre-KEP-5981 behavior. |
+
+The top rows show the scheduler as the primary enforcer with the driver as
+a backstop. The bottom rows show the driver as the sole enforcer with
+the scheduler being permissive. The transition row shows the scheduler being
+conservative (filtering) while the driver continues serving existing
+workloads.
+
+### Risks and Mitigations
+
+#### Fragmentation (Poisoning)
+
+**Risk**: One Pod with a unique affinity value could "lock" a high-capacity
+device, preventing other more common workloads from using the remaining 90%
+capacity.
+
+**Mitigation**: A scoring plugin will prioritize packing compatible workloads
+onto already-locked devices before consuming "clean" (unlocked) devices. This
+minimizes the number of devices locked to a single affinity group.
+
+#### Priority Inversion (Preemption Blindness)
+
+**Risk**: Standard Kubernetes preemption is blind to affinity locks. It triggers
+on *resource shortage*, not affinity mismatch. If a NIC has 15/16 slots
+available but is locked to the wrong subnet, the scheduler sees plenty of
+capacity and never enters the preemption path. A single low-priority Pod can
+permanently hold a high-capacity device hostage by setting a lock that no
+high-priority Pod can break.
+
+Even if preemption were triggered by an unrelated shortage, victim selection
+asks "which Pods free up slots?" — not "which Pods clear the lock?" The
+scheduler might preempt an unrelated Pod, freeing a slot on a device still
+locked to the wrong value.
+
+**Mitigation (Alpha)**: The scoring plugin reduces the probability by packing
+compatible workloads and preserving clean devices. However, this is a soft
+mitigation — it does not guarantee that a clean device will always be available,
+and alpha does not provide lock-breaking preemption.
+
+**Alpha limitation**: In alpha, a lower-priority Pod may continue to hold a
+lock that blocks a higher-priority incompatible Pod even when nominal capacity
+remains on the device. This is an expected limitation of the alpha scope rather
+than a correctness bug.
+
+**Mitigation (Beta)**: Lock-aware preemption (see [Beta graduation criteria](#beta))
+will teach the scheduler's PostFilter phase to detect affinity mismatch as a
+preemption-solvable problem and identify lock-holder Pods as preemption victims.
+
+#### Cache Staleness and Delayed Release Visibility
+
+**Risk**: Like other informer-based scheduler state, sharing-affinity lock state may
+briefly lag external claim release, pod deletion, eviction, or ResourceSlice updates.
+Because the scheduler maintains derived lock state in its internal cache, there can be
+a short propagation window in which a device is still observed as locked or in unknown-affinity
+state after the underlying API state has changed. During that window, the scheduler may
+conservatively skip the device for a scheduling cycle. This is not unique to sharing affinity;
+it is the feature-specific manifestation of normal cache propagation delay in scheduler-managed
+state. The result is a temporary loss of placement optimality rather than a correctness violation
+
+**Mitigation**: For scheduler-driven transitions such as Reserve / Unreserve, the cache is updated
+immediately. For externally driven transitions, informer reconciliation eventually converges the
+state. This matches the existing consistency model used elsewhere in scheduler and DRA cache-based decisions.
+
+#### Unexpected Affinity Values
+
+**Risk**: A claim specifies an unexpected or unique affinity value (e.g., an
+arbitrary subnet GUID or name), further fragmenting devices by locking them to
+rare values.
+
+**Mitigation**: In many cases, affinity values are externally defined (subnet
+names, partition keys) and cannot be validated by the driver. The primary
+mitigation is the scoring plugin: by packing compatible workloads onto
+already-locked devices before consuming clean ones, the scheduler naturally
+limits fragmentation even when affinity values are unpredictable. Additionally,
+cluster administrators can use `DeviceClass` CEL selectors to restrict which
+attribute values are accepted where domain-specific validation is feasible.
+
+#### Memory Overhead
+
+**Risk**: Affinity values accumulate in `AllocatedState`, increasing memory usage.
+
+**Mitigation**: In alpha, affinity values are stored as small strings (for
+example subnet or PKey identifiers), capped at 8 attribute keys per device.
+Per-device overhead is bounded at 8 key-value pairs in
+`AllocatedState.AffinityStates`, and entries are cleared when all claims release
+the device. The total overhead is proportional to active shared allocations,
+not total devices.
+
+## Design Details
+
+### API Enhancement
+
+#### ResourceSlice Device Spec
+
+```go
+type Device struct {
+    // ... existing fields (Name, Attributes, Capacity,
+    // AllowMultipleAllocations, Taints, etc.) ...
+
+    // SharingAffinity specifies constraints for sharing this device across
+    // multiple allocations. If set, only claims with matching affinity values
+    // for the specified attribute keys can share this device.
+    //
+    // This field is only meaningful when AllowMultipleAllocations is true.
+    //
+    // +optional
+    // +featureGate=DRASharingAffinity
+    SharingAffinity *DeviceSharingAffinity
+}
+
+// DeviceSharingAffinity defines which device attribute keys constrain
+// sharing across multiple claims.
+type DeviceSharingAffinity struct {
+    // parameterKeys lists the fully-qualified device attribute names that
+    // must have matching values across all claims sharing this device.
+    //
+    // In alpha, the corresponding values must be provided as strings in the
+    // recognized StructuredParameters entry. Support for additional value types
+    // is deferred.
+    //
+    // When the first claim is allocated to this device, the affinity values
+    // for these keys are recorded in AllocatedState. Subsequent claims can
+    // only share the device if their affinity values match exactly.
+    //
+    // The maximum number of attribute keys is 8.
+    //
+    // +required
+    // +listType=atomic
+    // +k8s:maxItems=8
+    parameterKeys []FullyQualifiedName
+}
+
+const SharingAffinityParameterKeysMaxSize = 8
+```
+
+#### Scheduler Enhancement
+
+##### Source of Truth for Affinity Locks
+
+The scheduler derives affinity locks **solely from active claims' structured
+parameters** (decoded from the well-known JSON schema in opaque config) — not
+from device attributes on the ResourceSlice. The driver is NOT required to write
+locked affinity values back to the ResourceSlice.
+
+- The ResourceSlice declares *which* keys constrain sharing (`parameterKeys`)
+- The claims declare *what* values they need (via well-known structured parameters)
+- The scheduler combines these to maintain the lock in `AllocatedState`
+
+This avoids two sources of truth that could diverge, eliminates ResourceSlice
+churn (no update every time a lock is set/cleared), and keeps driver
+implementation simple. Drivers MAY optionally publish current locked values as
+regular device attributes for observability (e.g., visible via `kubectl`), but
+the scheduler does not depend on them.
+
+When the last claim on a device is released, the scheduler clears the lock. The
+driver is responsible for device lifecycle — tearing down the old configuration
+(via `NodeUnprepareResources`) and reconfiguring for new claims (via
+`NodePrepareResources`). The scheduler does not track hardware reconfiguration
+state.
+
+##### Safety Model and Responsibility Split
+
+This feature intentionally keeps placement knowledge and hardware
+enforcement separate:
+
+- **Scheduler guarantee**: when it has recognized structured parameters for all
+  active allocations on a `sharingAffinity` device, it will not intentionally
+  co-place claims with incompatible affinity values on that device.
+- **Conservative fallback**: if the scheduler cannot reconstruct the effective
+  affinity state of a device (for example, due to legacy or invalid persisted
+  claims), it treats that device as unknown and filters it out for new
+  `sharingAffinity` placements until the device becomes clean.
+- **Driver guarantee**: the driver remains the final authority for programming
+  and validating the actual hardware mode during `NodePrepareResources`.
+- **Failure handling**: stale scheduler state or races may still cause prepare-
+  time rejection, and that rejection remains the final safety backstop.
+
+##### Cache Extension: Effective Device State
+
+To prevent race conditions during high-volume scheduling, the scheduler
+maintains affinity locks in its internal cache rather than relying on API server
+round-trips. This is consistent with how DRA already handles capacity tracking
+via `inFlightAllocations`.
+
+The scheduler's `AllocatedState` is extended to track affinity values alongside
+consumed capacity:
+
+```go
+type AffinityState struct {
+    // Unknown indicates that one or more active claims on the device do not
+    // expose reconstructable affinity values. When true, the device is filtered
+    // for new sharing-affinity placements until fully clean.
+    Unknown bool
+
+    // LockedAffinity stores the known lock for a device when Unknown is false.
+    // Empty means the device is clean/unlocked.
+    LockedAffinity map[string]string
+}
+
+type AllocatedState struct {
+    AllocatedDevices         sets.Set[DeviceID]
+    AllocatedSharedDeviceIDs sets.Set[SharedDeviceID]
+    AggregatedCapacity       ConsumedCapacityCollection
+    
+    // +featureGate=DRASharingAffinity
+    AffinityStates map[DeviceID]AffinityState
+}
+```
+
+
+##### Filter and Score Phases
+
+**Filter phase**: For a given node, the scheduler evaluates each device. A
+device with `sharingAffinity` is a candidate ONLY if:
+
+1. It has sufficient consumable capacity (KEP-5075)
+2. The device's `AffinityStates[deviceID].Unknown` is not true
+3. The claim has exactly one scheduler-recognized `StructuredParameters`
+   config entry targeting the relevant request
+4. That entry can be decoded successfully using the well-known schema
+5. The claim provides values for ALL keys in `sharingAffinity.parameterKeys`
+   (missing key → device is not a candidate)
+6. For each required affinity key, the recognized entry provides a string
+   value (non-string values are invalid in alpha)
+7. The device's `AffinityStates[deviceID].LockedAffinity` is either empty (unlocked) OR matches the
+   claim's affinity values for ALL keys
+
+The scheduler identifies the structured-parameters entry by `opaque.driver:
+resource.k8s.io` plus `apiVersion: resource.k8s.io/v1alpha1` and
+`kind: StructuredParameters` in the embedded payload. Driver-specific config
+entries are ignored by the scheduler.
+
+If a device has `AffinityStates[deviceID].Unknown` set, or if a required request has no
+recognized structured-parameters entry, more than one recognized entry, an
+entry that fails schema/decoding checks, or a required affinity key with a
+non-string value, the device is filtered out for `sharingAffinity`
+scheduling. This is the safe default: the driver declared that sharing
+requires specific scheduler-readable parameters, and a scheduler that cannot
+reconstruct the current or requested affinity state cannot evaluate placement
+safely. Claims that do not need sharing-constrained devices should target
+devices without `sharingAffinity`.
+
+**Score phase**: The normative ordering in alpha is:
+
+1. A device already locked to a compatible affinity value scores highest.
+2. A clean (unlocked) device with `sharingAffinity` scores next — it can
+   establish a new lock and enable packing for future claims.
+3. A device without `sharingAffinity` scores lowest among otherwise equivalent
+   candidates — the scheduler has no affinity enforcement for this device, so
+   packing benefits are lost.
+4. An incompatible locked device, or a device with `AffinityStates[deviceID].Unknown` set, is
+   not scored because it was already filtered out.
+
+This preserves unlocked devices for future workloads with different affinity
+values, minimizing fragmentation. During mixed rollouts (some devices with
+`sharingAffinity`, some without), this naturally steers affinity-aware claims
+toward upgraded devices. The exact score weights are implementation-defined
+in alpha; the required behavior is that a compatible locked device is preferred
+over an otherwise equivalent clean device.
+
+
+##### Reserve Phase: Tentative Locking
+
+Once a node/device is selected, the Reserve plugin establishes a "tentative
+lock" in the scheduler cache before the Binding phase:
+
+1. Scheduler evaluates a multi-allocatable device with `sharingAffinity`
+2. If device has no existing allocations (unlocked):
+   - Extract affinity values for `sharingAffinity.parameterKeys` from the claim's
+     structured parameters (decoded from opaque config)
+   - Record values in `AllocatedState.AffinityStates[deviceID].LockedAffinity`
+   - Proceed with allocation (device is now tentatively locked)
+3. If device has existing allocations (locked):
+   - Compare claim's affinity values against `AllocatedState.AffinityStates[deviceID].LockedAffinity`
+   - If all keys match: proceed with allocation (pack onto locked device)
+   - If any key mismatches: skip this device, try next candidate
+
+This tentative lock is immediately visible to subsequent scheduling cycles. If
+Pod-B is evaluated milliseconds after Pod-A's Reserve (before Pod-A's bind
+reaches the API server), Pod-B's Filter phase will see Pod-A's tentative lock
+and either join it or skip the device. This follows the same pattern used by
+`SignalClaimPendingAllocation()` for capacity tracking.
+
+##### State Transitions
+
+| Event | Cache Action | Result |
+|-------|-------------|--------|
+| Pod scheduled (Reserve) | Set `AffinityStates[deviceID].LockedAffinity` | Device locked; subsequent claims must match |
+| Scheduling failure (Unreserve) | Remove tentative lock if no other claims share it | Device may become unlocked |
+| All claims released | Clear `AffinityStates[deviceID]` | Device becomes unlocked |
+| Driver adds `sharingAffinity` to in-use device | Mark `AffinityStates[deviceID].Unknown` if active claims are non-reconstructable | Device blocked for new sharing workloads until legacy claims drain |
+
+##### Implementation Note: Snapshot Consistency
+
+Since the scheduler works on a snapshot of the cache for each Pod, the Reserve
+phase must update the primary cache so that subsequent snapshots in the same
+scheduling cycle reflect the new lock. This aligns with how VolumeBinding and
+PodAffinity currently handle "assumed" states.
+
+**Parallel scheduling**: In clusters with parallel scheduling enabled, multiple
+pods may reach the Filter phase concurrently. Without protection, two pods with
+*different* affinities could both pass Filter for the same clean device in the
+same millisecond. To prevent this, all reads and writes to
+`AllocatedState.AffinityStates` must be protected by the `AllocatedState` mutex.
+The Filter phase acquires a read lock to check the current affinity state; the
+Reserve phase acquires a write lock to set the tentative lock atomically. This
+ensures that once one pod's Reserve completes, the next pod's Filter sees the
+updated lock.
+
+##### Scheduler Restart: State Reconstruction
+
+On scheduler restart, the in-memory `AffinityStates` map is empty. The scheduler
+must reconstruct affinity locks from persisted state before the first scheduling
+cycle begins.
+
+**Reconstruction algorithm**:
+
+1. On startup, the scheduler iterates all `Bound` ResourceClaims (same path as
+   existing `GatherAllocatedState()` for capacity reconstruction).
+2. For each bound claim, check if the allocated device has `SharingAffinity`
+   defined in the corresponding ResourceSlice.
+3. If yes, attempt to decode the claim's opaque config using the well-known JSON
+   schema and extract the required structured parameters.
+4. If decoding succeeds and all required affinity keys are present as strings,
+   populate `AffinityStates[deviceID].LockedAffinity` with those values.
+5. If the claim has no recognized `StructuredParameters` entry, malformed
+   structured parameters, non-string values for a required affinity key, or
+   multiple recognized structured-parameters entries for the same request,
+   set `AffinityStates[deviceID].Unknown = true` and log a warning. The scheduler
+   must not infer lock state from ambiguous or invalid data.
+6. If multiple claims share the same device and any one of them causes the
+   device to become unknown, the device remains with `AffinityStates[deviceID].Unknown` set
+   until all claims on that device are released.
+7. If multiple reconstructable claims share the same device, verify their values
+   are consistent (they must be, by construction—but log a warning if not).
+
+This follows the same pattern used to reconstruct `AggregatedCapacity` from
+bound claims on startup. No new API calls are needed; the data is already
+available from the ResourceClaim spec and ResourceSlice spec cached by the
+scheduler's informers.
+
+
+### Examples
+
+#### ResourceSlice with Sharing Affinity
+
+```yaml
+apiVersion: resource.k8s.io/v1
+kind: ResourceSlice
+metadata:
+  name: node1-nics
+spec:
+  driver: networking.example.com
+  nodeName: node1
+  devices:
+    - name: eth1
+      allowMultipleAllocations: true
+      sharingAffinity:
+        parameterKeys: ["networking.example.com/subnet"]
+      attributes:
+        networking.example.com/type:
+          string: "sriov-vf"
+      capacity:
+        networking.example.com/slots:
+          value: "16"
+    - name: eth2
+      allowMultipleAllocations: true
+      sharingAffinity:
+        parameterKeys: ["networking.example.com/subnet"]
+      attributes:
+        networking.example.com/type:
+          string: "sriov-vf"
+      capacity:
+        networking.example.com/slots:
+          value: "16"
+```
+
+#### ResourceClaim with Affinity Value
+
+```yaml
+apiVersion: resource.k8s.io/v1
+kind: ResourceClaim
+metadata:
+  name: pod-a-nic
+spec:
+  devices:
+    requests:
+      - name: nic
+        exactly:
+          deviceClassName: shared-nic
+    config:
+      # Well-known structured parameters (scheduler decodes for affinity matching)
+      - requests: ["nic"]
+        opaque:
+          driver: resource.k8s.io
+          parameters:
+            apiVersion: resource.k8s.io/v1alpha1
+            kind: StructuredParameters
+            attributes:
+              networking.example.com/subnet:
+                string: "subnet-X"
+      # Driver-specific opaque config (scheduler ignores this)
+      - requests: ["nic"]
+        opaque:
+          driver: networking.example.com
+          parameters:
+            apiVersion: networking.example.com/v1
+            kind: NICConfig
+            vlanId: 100
+```
+
+> **Note**: The first config block uses the well-known `StructuredParameters`
+> schema with `driver: resource.k8s.io`, which the scheduler recognizes and
+> decodes for affinity matching. The second config block is standard opaque
+> driver config that only the driver reads. For simple cases where the driver
+> can read both, a single well-known config block may be sufficient.
+
+#### Multi-key SharingAffinity Example
+
+This example illustrates the alpha semantics when a device constrains sharing on
+multiple keys.
+
+A driver advertises a shared RDMA-capable NIC where both subnet and PKey
+must match for pods to share the same device:
+
+```yaml
+apiVersion: resource.k8s.io/v1
+kind: ResourceSlice
+spec:
+  devices:
+    - name: mlx5_0
+      allowMultipleAllocations: true
+      sharingAffinity:
+        parameterKeys:
+          - networking.example.com/subnet
+          - networking.example.com/pkey
+      capacity:
+        networking.example.com/slots:
+          value: "16"
+```
+
+A matching claim provides both values in the scheduler-recognized structured
+parameters:
+
+```json
+{
+  "apiVersion": "resource.k8s.io/v1alpha1",
+  "kind": "StructuredParameters",
+  "attributes": {
+    "networking.example.com/subnet": {"string": "subnet-a"},
+    "networking.example.com/pkey": {"string": "0x8001"},
+    "networking.example.com/vlan": {"string": "100"}
+  }
+}
+```
+
+Alpha matching behavior:
+
+- If the device is clean, the first compatible claim sets the lock to:
+  - `subnet = subnet-a`
+  - `pkey = 0x8001`
+- A later claim with the same `subnet` and same `pkey` may share the
+  device.
+- A claim with `subnet = subnet-a` but `pkey = 0x8002` is rejected for that
+  device because all declared keys must match.
+- A claim that provides only `subnet` but omits `pkey` is rejected for that
+  device because missing declared keys are invalid.
+- The extra `vlan` key is ignored for this device because the driver did not
+  declare `networking.example.com/vlan` in `parameterKeys`.
+
+### Test Plan
+
+[x] I/we understand the owners of the involved components may require updates to
+existing tests to make this code solid enough prior to committing the changes necessary
+to implement this enhancement.
+
+##### Prerequisite testing updates
+
+Existing DRA scheduling tests should pass before adding sharing affinity tests.
+
+##### Unit tests
+
+- `pkg/scheduler/framework/plugins/dynamicresources`: Coverage for affinity matching
+  logic, including:
+  - Filter: device with matching lock passes
+  - Filter: device with conflicting lock is excluded
+  - Filter: unlocked device with sufficient capacity passes
+  - Filter: claim missing a required `parameterKey` → device filtered out
+  - Filter: claim with extra keys beyond device's declared `parameterKeys` → extra
+    keys ignored, device passes if declared keys match
+  - Filter: no recognized `StructuredParameters` entry for a sharing-constrained
+    request → device filtered out
+  - Filter: malformed recognized `StructuredParameters` payload → device filtered out
+  - Filter: duplicate recognized `StructuredParameters` entries for one request →
+    device filtered out
+  - Filter: non-string value for a required affinity key → device filtered out
+  - Filter: device with `AffinityStates[deviceID].Unknown` set is excluded for new
+    `sharingAffinity` scheduling
+  - Score: locked-compatible device scores higher than clean device
+  - Reserve: first claim sets lock; second claim with same values succeeds
+  - Reserve: second claim with conflicting values fails
+  - Unreserve: tentative lock is rolled back
+  - Legacy claims with non-reconstructable affinity cause the device to be marked
+    unknown rather than establishing or joining a lock
+  - Legacy-claim handling: all scenarios from the `Handling Legacy Claims with
+    Unknown Affinity` table
+  - Compatibility matrix: device without `sharingAffinity` is unaffected —
+    claims with or without `StructuredParameters` both pass (Legacy/Basic and
+    Permissive Sharing rows)
+  - Strict Gating: device has `sharingAffinity` but claim provides zero
+    `StructuredParameters` config entries → device filtered out
+- `staging/src/k8s.io/api/resource/v1`: Coverage for new API types and the
+  recognized structured-parameters contract, including:
+  - Validation: `parameterKeys` exceeding max 8 limit is rejected
+  - Validation: structured parameters exceeding max 8 attributes is rejected
+  - Validation: duplicate recognized structured-parameters entries for the same
+    request are rejected when validation can detect them
+  - Validation: non-string values for keys referenced by `sharingAffinity`
+    are rejected when validation can detect them
+  - Round-trip serialization of `SharingAffinity` and well-known schema
+
+##### Integration tests
+
+- Affinity matching with multiple claims to same device
+- Affinity mismatch causing allocation to different device
+- Affinity lock clearing when all claims release a device
+- Interaction with consumable capacity constraints (KEP-5075)
+- Scheduler restart: `AffinityStates` correctly reconstructed from existing
+  bound ResourceClaims, and devices with non-reconstructable active claims have
+  `AffinityStates[deviceID].Unknown` set
+- Parallel scheduling: two Pods with conflicting affinity values targeting the
+  same device — one wins Reserve, the other is requeued
+- Feature gate disabled: `sharingAffinity` fields are ignored; devices are
+  treated as unconditionally shareable
+- Feature gate toggled: enabling after claims exist does not disrupt already-bound
+  workloads, and legacy in-use devices are conservatively filtered until clean
+- Invalid structured parameters: malformed payload or duplicate recognized
+  entries for one request do not crash scheduling and deterministically exclude
+  sharing-constrained devices
+- Invalid value type: a required affinity key encoded as a non-string value is
+  rejected for `sharingAffinity` scheduling and does not populate lock state
+- Ghost Lock: Pod is Assumed (tentative lock set) but Bind fails — verify
+  the lock is cleared immediately and the next Pod in the queue can claim the
+  device with a different affinity value
+- Legacy Device Migration: 5 Pods are already running on a NIC; the driver
+  updates `ResourceSlice` to add `sharingAffinity`; a 6th Pod arrives with
+  structured parameters — verify the device has `AffinityStates[deviceID].Unknown` set
+  and the 6th Pod is filtered from that device until all legacy claims drain
+- Partial Key: Device requires `subnet` and `pkey` in `parameterKeys`;
+  claim provides only `subnet` — verify the device is filtered out
+- Score Packing: Two devices available, one already locked to subnet-X;
+  new claim for subnet-X → verify the claim is placed on the locked device,
+  not the clean one (full Filter→Score→Reserve pipeline)
+- Permissive Sharing (no SA): Device without `sharingAffinity`, claim with
+  `StructuredParameters` — verify scheduler allows the allocation and SP are
+  not evaluated for affinity
+- Driver Backstop: Device without `sharingAffinity`, two claims with
+  incompatible config land on the same device — verify scheduler allows both
+  (permissive), and `NodePrepareResources` rejects the incompatible claim
+- NodePrepareResources failure does not clear lock: Claim is bound and
+  lock is persisted, but `NodePrepareResources` fails on the node — verify
+  the affinity lock remains in the scheduler cache
+
+##### e2e tests
+
+- End-to-end test with mock DRA driver using sharing affinity
+- Multi-pod scheduling: Pods with matching affinity values share the same device
+- Multi-pod scheduling: Pods with conflicting affinity values are placed on
+  different devices
+- Lock lifecycle: last Pod deleted → lock cleared → new Pod with different
+  affinity value can claim the device
+- Rollout scenario: existing Pods running without `sharingAffinity`; driver
+  adds `sharingAffinity` to ResourceSlice; verify existing Pods continue
+  running and new Pods respect the new constraint after legacy claims drain
+
+### Graduation Criteria
+
+#### Alpha
+
+- Feature implemented behind `DRASharingAffinity` feature gate
+- API fields added to ResourceSlice (`SharingAffinity` on `Device`)
+- Well-known `StructuredParameters` JSON schema defined for opaque config
+- Scheduler decodes well-known schema from opaque config for affinity matching
+- Scheduler Filter plugin enforces affinity matching
+- Scheduler Score plugin prefers locked-compatible devices over clean devices
+- Scheduler tracks affinity in AllocatedState
+- Unit and integration tests
+- Documentation for driver authors
+- Alpha documentation explicitly calls out the lack of lock-breaking preemption
+  semantics for incompatible locks
+- Alpha documentation explicitly calls out string-only affinity matching and
+  the rejection of non-string values for `sharingAffinity` keys
+
+#### Beta
+
+- Gather feedback from DRA driver developers
+- Address any issues found in alpha
+- **Lock-aware preemption**: PostFilter detects affinity mismatch as a
+  preemption-solvable problem; identifies lock-holder Pods as victims when a
+  higher-priority Pod needs a device locked to an incompatible value
+- E2e tests stable
+- Performance validation with high pod churn
+
+#### GA
+
+- At least 2 production drivers using sharing affinity
+- No significant issues reported
+- Conformance tests if applicable
+
+### Upgrade / Downgrade Strategy
+
+**Upgrade**: Existing ResourceSlices without `sharingAffinity` continue to work.
+New field is additive. See the [Compatibility Matrix](#compatibility-matrix) for
+how the scheduler and driver behave across all combinations of device
+`sharingAffinity` and claim `StructuredParameters` presence.
+
+**Recommended rollout sequence**: To minimize capacity stranding, drivers should
+ideally:
+
+1. Wait for a device to be idle (clean).
+2. Update the ResourceSlice to include the `sharingAffinity` field.
+3. Allow the scheduler to establish the first known lock with a new claim.
+
+During mixed rollouts (some devices with `sharingAffinity`, some without), the
+scoring preference for `sharingAffinity` devices (see
+[Score](#filter-and-score-phases)) naturally steers affinity-aware claims toward
+upgraded devices.
+
+**Adding `sharingAffinity` to an in-use device**: A driver may add or update
+`sharingAffinity` on a device that already has active (bound) ResourceClaims.
+This can happen during driver upgrades or when enabling the feature on existing
+hardware. The scheduler handles this as follows:
+
+- **Pre-existing claims continue to run** and are not evicted.
+- If any active claim on that device does not provide reconstructable
+  affinity values for the required keys, the scheduler marks the device as
+  `AffinityStates[deviceID].Unknown = true`.
+- A device with `AffinityStates[deviceID].Unknown` set is not eligible for new
+  `sharingAffinity` placements, even if it still has nominal shared capacity.
+- **Once all active claims on that device are released**, the device becomes
+  clean and subsequent allocations can establish and enforce affinity normally.
+- Drivers enabling this feature on existing hardware should prefer doing so on
+  clean devices, because alpha intentionally chooses conservative correctness
+  over mid-flight reuse of devices whose effective modal state is unknown.
+
+> **Note**: The API server does not cross-validate ResourceSlice updates against
+> active ResourceClaims. Enforcing "no `sharingAffinity` changes while claims
+> are active" would require a new admission controller with cross-object
+> validation, which is fragile and out of scope for this KEP. Drivers should
+> avoid adding `sharingAffinity` mid-flight when possible, but the scheduler
+> must handle it safely when it occurs.
+
+**Handling missing or malformed parameters**: The scheduler treats a device
+with `sharingAffinity` as a protected resource. If a device requires both
+`subnet` and `pkey` but a claim only provides `subnet`, the device is filtered
+out — all declared keys are mandatory. If a claim's `StructuredParameters`
+entry is malformed or contains non-string values for required keys in alpha,
+the device is excluded.
+
+**Downgrade**: If the feature gate is disabled:
+
+- The `sharingAffinity` field is not persisted on new writes.
+- The scheduler ignores the field — all devices return to unconditional sharing
+  (pre-KEP-5981 behavior).
+- The DRA driver becomes the sole authority for enforcing hardware compatibility
+  at `NodePrepareResources`.
+
+### Version Skew Strategy
+
+- **kube-apiserver**: Must be upgraded first to accept the new
+  `sharingAffinity` API field on `ResourceSlice`.
+- **kube-scheduler**:
+  - A scheduler that understands this feature enforces `sharingAffinity`, tracks
+    `AffinityStates`, and may conservatively mark devices with
+    `AffinityStates[deviceID].Unknown` set when effective affinity cannot be reconstructed.
+  - An older scheduler ignores `sharingAffinity`. In that skew case, placement
+    may be overly permissive and the DRA driver remains the final safety
+    backstop during `NodePrepareResources`.
+- **kubelet**: No changes required; kubelet does not interpret
+  `sharingAffinity`.
+- **DRA driver**:
+  - Drivers publish ResourceSlices with the `sharingAffinity` field on devices.
+  - Drivers must continue validating actual hardware compatibility at prepare
+    time, especially during skew where an older scheduler may not enforce the
+    affinity constraints.
+
+During version skew, the main outcomes are permissive scheduling by an older
+scheduler or conservative filtering by a newer scheduler when affinity state
+cannot be reconstructed. Both are operationally safe as long as the driver
+continues rejecting incompatible prepare-time configurations.
+
+## Production Readiness Review Questionnaire
+
+### Feature Enablement and Rollback
+
+###### How can this feature be enabled / disabled in a live cluster?
+
+- [x] Feature gate
+  - Feature gate name: `DRASharingAffinity`
+  - Components depending on the feature gate: kube-apiserver, kube-scheduler
+
+###### Does enabling the feature change any default behavior?
+
+No. Devices without `sharingAffinity` behave exactly as before. The feature only affects devices that explicitly opt-in via the new field.
+
+###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
+
+Yes. Disabling the feature gate causes:
+- API server to strip the `sharingAffinity` field from new or updated
+  ResourceSlices before persisting (writes succeed, field is not stored)
+- Scheduler to ignore existing `sharingAffinity` fields for future placement
+  decisions
+
+Existing allocations continue to work. New allocations may become more
+permissive, so the driver must continue validating compatibility at prepare
+time.
+
+###### What happens if we reenable the feature if it was previously rolled back?
+
+The scheduler resumes enforcing `sharingAffinity` for future placement
+decisions. Existing allocations are not evicted. However, ResourceSlices that
+were created or updated while the gate was disabled will not have the
+`sharingAffinity` field (it was stripped by the API server). Drivers must
+republish their ResourceSlices with `sharingAffinity` for the feature to take
+effect. If there are active allocations whose affinity cannot be reconstructed
+at that point, the corresponding devices are treated conservatively until they
+become clean.
+
+###### Are there any tests for feature enablement/disablement?
+
+Yes, unit tests will cover the feature gate behavior for API validation and scheduler logic.
+
+### Rollout, Upgrade and Rollback Planning
+
+###### How can a rollout or rollback fail? Can it impact already running workloads?
+
+Rollout failure modes include:
+
+- **Older scheduler after API enablement**: the scheduler ignores `sharingAffinity`
+  and placement may be overly permissive. The driver remains the safety backstop
+  at prepare time.
+- **Newer scheduler enabling conservative handling on legacy in-use devices**:
+  devices with non-reconstructable active claims may be filtered until they are
+  clean, which can temporarily reduce effective schedulable capacity.
+
+Rollback failure mode: if the scheduler is rolled back while the API server
+still serves the field, placement returns to permissive behavior for new
+scheduling decisions.
+
+Running workloads are not evicted by this feature; the impact is on future
+placement decisions, not on already-running pods.
+
+###### What specific metrics should inform a rollback?
+
+Illustrative rollback signals include:
+
+- `sharing_affinity_filter_mismatch_total` increasing unexpectedly,
+- `sharing_affinity_unknown_device_total` remaining elevated after rollout,
+- spikes in affinity-related scheduler events or unschedulable pods,
+- driver prepare failures due to incompatible or stale configurations.
+
+###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
+
+Will be tested before beta.
+
+###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
+
+No.
+
+### Monitoring Requirements
+
+###### How can an operator determine if the feature is in use by workloads?
+
+Operators can determine usage by inspecting `ResourceSlice` objects that set
+`sharingAffinity` and by observing `ResourceClaim`s that include recognized
+`StructuredParameters` for requests targeting those devices.
+
+###### How can someone using this feature know that it is working for their instance?
+
+A user should be able to observe that:
+
+- compatible claims preferentially pack onto already-locked devices,
+- incompatible claims are filtered before bind/prepare when the scheduler has
+  reconstructable affinity state,
+- devices with unknown legacy affinity state are conservatively excluded until
+  they become clean.
+
+In practice, this should be visible through scheduler logs, scheduler events,
+and (where implemented) scheduler metrics.
+
+###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
+
+This enhancement should not materially regress baseline DRA scheduling latency
+for clusters that do not use `sharingAffinity`.
+
+For clusters that do use the feature, the primary objective is **correctness of
+compatibility-aware placement** with bounded incremental scheduling overhead.
+
+###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
+
+Useful SLIs include:
+
+- rate of scheduling attempts filtered due to `sharingAffinity` mismatch,
+- rate of devices with `AffinityStates[deviceID].Unknown` set,
+- rate of malformed or duplicate recognized `StructuredParameters` payloads,
+- share of successful placements that pack onto already-locked compatible
+  devices,
+- prepare-time rejections by the DRA driver caused by incompatible hardware
+  configuration.
+
+###### Are there any missing metrics that would be useful to have to improve observability of this feature?
+
+This feature would benefit from scheduler-observable counters and/or events for:
+
+- `sharing_affinity_filter_mismatch_total`
+- `sharing_affinity_filter_missing_parameters_total`
+- `sharing_affinity_filter_invalid_parameters_total`
+- `sharing_affinity_unknown_device_total`
+- `sharing_affinity_packed_allocation_total`
+
+Exact metric names are illustrative and implementation-specific, but
+equivalent observability is strongly recommended.
+
+In addition, user-facing diagnostics should make the reason for filtering clear,
+for example:
+
+- missing required structured parameters for request `<name>`,
+- duplicate recognized `StructuredParameters` entries for request `<name>`,
+- required key `<key>` has a non-string value in alpha,
+- device `<id>` is locked to incompatible affinity values,
+- device `<id>` has unknown affinity state due to legacy or invalid active
+  claims.
+
+### Dependencies
+
+###### Does this feature depend on any specific services running in the cluster?
+
+- DRA must be enabled (GA in 1.34)
+- KEP-5075 (Consumable Capacity) for multi-allocatable devices
+
+### Scalability
+
+###### Will enabling / using this feature result in any new API calls?
+
+No new API calls. Affinity data is extracted from ResourceSlice and ResourceClaim
+objects already fetched by existing informers.
+
+###### Will enabling / using this feature result in introducing new API types?
+
+No. Only new fields on existing types.
+
+###### Will enabling / using this feature result in any new calls to the cloud provider?
+
+No.
+
+###### Will enabling / using this feature result in increasing size or count of the existing API objects?
+
+- ResourceSlice: Small increase per device with `sharingAffinity` — the
+  `parameterKeys` field adds up to 8 fully-qualified key names (capped by
+  `SharingAffinityParameterKeysMaxSize = 8`)
+- ResourceClaim: Small increase when claims include a `StructuredParameters`
+  opaque config entry with affinity values (up to 8 string-valued attributes,
+  matching the device-side cap)
+
+###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
+
+Negligible. The Filter phase decodes the `StructuredParameters` opaque config
+payload once per candidate device (bounded by payload size and 8-key cap).
+The affinity comparison itself is O(k) where k ≤ 8 — a map lookup per key.
+
+###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
+
+No. The per-component impact is bounded:
+
+- **Scheduler RAM**: `AffinityStates` adds one `map[string]string` (up to 8
+  entries) per device with active affinity locks — proportional to active shared
+  allocations, not total devices.
+- **Scheduler CPU**: JSON decoding of the `StructuredParameters` opaque config
+  entry during Filter adds a small per-candidate cost, bounded by the 8-key cap.
+- **etcd disk**: Slightly larger ResourceSlice and ResourceClaim objects (see
+  API size answer above), bounded by the same caps.
+
+###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?
+
+No.
+
+### Troubleshooting
+
+###### How does this feature react if the API server and/or etcd is unavailable?
+
+Like existing scheduler-driven DRA logic, this feature depends on informer state
+and cached API data. Temporary API server or etcd unavailability does not by
+itself invalidate already-computed in-memory lock state, but new pods will not
+be scheduled during unavailability. Sustained control-plane unavailability may
+delay reconciliation of claim release, slice updates, or restart reconstruction.
+
+The driver remains the final enforcement authority at prepare time.
+
+###### What are other known failure modes?
+
+Known failure modes include:
+
+- **Malformed structured parameters**: the scheduler cannot decode the
+  recognized payload and filters the device.
+- **Duplicate recognized entries for one request**: the scheduler treats the
+  request as invalid for `sharingAffinity` scheduling.
+- **Missing required keys**: the claim cannot be matched against a device that
+  declares those keys.
+- **Non-string values for required keys in alpha**: the device is filtered for
+  that claim.
+- **Unknown affinity state**: the device has active allocations whose affinity
+  cannot be reconstructed, so it is conservatively filtered until clean.
+- **Prepare-time driver rejection**: despite scheduler filtering, the driver may
+  still reject an incompatible or stale placement and that rejection is the
+  final safety backstop.
+- **Partial feature gate enablement**: if the feature gate is enabled on the API
+  server but not the scheduler (or vice versa), the `sharingAffinity` field may
+  be persisted but not enforced, or enforced but not accepted on writes. Ensure
+  the gate is enabled on both `kube-apiserver` and `kube-scheduler`.
+
+###### What steps should be taken if SLOs are not being met to determine the problem?
+
+Recommended debugging flow:
+
+1. Inspect the relevant `ResourceSlice` and confirm the device declares the
+   expected `sharingAffinity.parameterKeys`.
+2. Inspect the `ResourceClaim` and confirm there is exactly one recognized
+   `StructuredParameters` entry for the relevant request.
+3. Verify that every required affinity key is present and string-valued.
+4. Check whether the target device is already locked to incompatible values
+   (lock state is in the scheduler's in-memory cache — check scheduler logs
+   for filter reasons mentioning affinity mismatch).
+5. Check whether the device is being treated as having **unknown affinity
+   state** because of legacy or invalid active claims.
+6. Review scheduler logs/events for explicit filter reasons.
+7. If the scheduler allowed placement but the driver rejected prepare, inspect
+   driver logs to determine whether the issue was stale scheduler state,
+   unsupported config, or an actual device-level incompatibility.
+
+User-facing diagnostics should prefer concrete messages over generic
+unschedulable errors whenever possible.
+
+## Implementation History
+
+- 2026-03-27: Initial KEP issue created
+- 2026-03-30: KEP document drafted
+
+## Drawbacks
+
+- Adds a new cache dimension (`AffinityStates`) to the scheduler's allocation
+  tracking, increasing the surface area for reconstruction bugs on restart
+- Once a device is locked, its effective affinity cannot change until all
+  claims on that device are released
+- Fragmentation risk remains if affinity values are too fine-grained
+- Conservative handling of legacy in-use devices can temporarily strand
+  schedulable capacity during rollout or migration
+- If a driver declares `sharingAffinity` on a device but no claims ever provide
+  `StructuredParameters`, that device becomes effectively unschedulable for
+  sharing workloads — all claims are filtered out by the "Strict Gating" rule.
+  Drivers should coordinate with workload teams to ensure claims include
+  `StructuredParameters` before enabling `sharingAffinity` on devices.
+- The scheduler now depends on decoding a well-known JSON schema from opaque
+  config — a new coupling that didn't exist before. If the schema evolves,
+  backward compatibility must be maintained across scheduler versions.
+- Affinity locks are purely in-memory with no API or status field to inspect
+  which devices are locked to which values. Debugging lock state requires
+  scheduler logs.
+
+## Alternatives
+
+### Claim-side SharingAffinity (on DeviceRequest)
+
+Instead of using StructuredParameters in opaque config to supply constraint
+values, an alternative design adds a dedicated `SharingAffinity` field on
+`DeviceRequest` within ResourceClaim:
+
+```go
+type DeviceRequest struct {
+    // ... existing fields ...
+    SharingAffinity *SharingAffinity
+}
+
+type SharingAffinity struct {
+    AttributeName string          // e.g., "networking.k8s.io/pkey"
+    Value         string          // e.g., "0x8001"
+    Strategy      SharingStrategy // e.g., "LockOnFirstUse"
+}
+```
+
+**Rejected because**:
+- **Requires an API change to ResourceClaim**: Adding a typed `SharingAffinity`
+  field to `DeviceRequest` introduces a new API field, whereas supplying the
+  same constraint values via StructuredParameters in existing opaque config
+  requires no API change at all. The opaque config path is the preferred
+  approach per @pohly's guidance—use well-known schemas inside existing opaque
+  config rather than adding new structured fields to ResourceClaim.
+
+### Object Reference-based Affinity Matching
+
+An alternative approach replaces inline affinity values with external object
+references. Instead of embedding values in opaque config, the claim would
+reference a CRD (e.g., `NetworkConfiguration`) by name, and the device would
+declare which object kinds constrain sharing:
+
+```yaml
+# External CRD
+kind: NetworkConfiguration
+metadata:
+  name: subnet-a
+spec:
+  subnet: 10.0.1.0/24
+
+# ResourceClaim
+config:
+  objectRefs:           # new field
+  - kind: NetworkConfiguration
+    name: subnet-a
+
+# Device
+commonConfigKind:       # new field
+- NetworkConfiguration
+```
+
+**Rejected because**:
+- Requires new fields on both ResourceClaim (`objectRefs`) and Device
+  (`commonConfigKind`), whereas the chosen approach adds a field only to
+  Device and uses existing opaque config for claim-side values
+- Requires external CRD definitions, adding operational burden for cluster
+  administrators
+- Multi-dimensional affinity: A device may need affinity on multiple independent axes
+  (e.g., subnet + VLAN). With object references, each axis would need its own CRD.
+- Conflicts with the direction from @pohly to avoid new API fields on claims
+  and use well-known schemas inside existing opaque config
+
+### Placeholder Pattern Workaround
+
+Without this KEP, drivers must use a "placeholder pattern" today:
+
+1. Publish devices with `capacity: 1` initially
+2. Wait for first claim to determine affinity value
+3. Update ResourceSlice with actual capacity and affinity as attribute
+4. Use CEL selector to match affinity attribute
+
+
+**Problems**:
+- Race condition: Second pod may go to different device before expansion
+- ResourceSlice churn: Constant updates as pods come and go
+- Driver complexity: State machine for expand/contract lifecycle
+
+### CEL-based Affinity Matching
+
+An alternative approach uses CEL expressions to evaluate affinity compatibility,
+rather than introducing new structured API fields. Two variants were considered:
+
+**Variant A: Claim-to-claim CEL matching**
+
+Allow CEL expressions in a ResourceClaim to reference other claims' allocations
+on the same device. For example:
+
+```yaml
+constraints:
+  - cel:
+      expression: >
+        device.allocations.all(a,
+          !has(a.config.subnet) || a.config.subnet == "subnet-X")
+```
+
+**Rejected because**:
+- Creates a circular dependency: Claim A's eligibility depends on Claim B's
+  allocation, and vice versa. The scheduler cannot evaluate both simultaneously.
+- CEL evaluation order becomes undefined—the result depends on which claim is
+  evaluated first, making scheduling non-deterministic.
+- The CEL environment would need to expose `device.allocations`, a runtime
+  collection of other claims' configs. This is a fundamentally different
+  evaluation model from today's single-device CEL selectors.
+
+**Variant B: Driver-published CEL lock expressions on ResourceSlice**
+
+The driver publishes a CEL expression on the ResourceSlice that evaluates
+whether a claim is compatible with the device's current lock state:
+
+```yaml
+devices:
+  - name: eth1
+    sharingAffinity:
+      lockExpression: >
+        device.affinityLock['subnet'] == '' ||
+        device.affinityLock['subnet'] == claim.AffinityValues['subnet']
+```
+
+**Rejected because**:
+- `device.affinityLock` is runtime scheduler state, not a static device
+  attribute. Exposing it in CEL requires extending the evaluation context to
+  include the scheduler's in-memory `AllocatedState`, which breaks the current
+  model where CEL only evaluates against the ResourceSlice snapshot.
+- `claim.AffinityValues` is not currently part of the CEL evaluation context
+  either. Adding it requires changes to the CEL environment definition, the
+  scheduler's expression compiler, and the cost estimator.
+- CEL expressions are powerful but opaque to the scheduler—it cannot extract
+  *which* keys constrain sharing or *what* values to record in `AllocatedState`.
+  The scheduler would need to both evaluate the expression AND separately track
+  lock state, duplicating logic.
+- While Kubernetes is adopting CEL broadly (ValidatingAdmissionPolicy, DRA
+  selectors), those use cases evaluate static data. Affinity matching requires
+  reasoning about mutable runtime state, which is a qualitatively different
+  problem better served by a purpose-built mechanism.
+
+## Future Enhancements
+
+The following ideas are out of scope for alpha but are worth exploring in
+beta/GA based on real-world feedback:
+
+### Priority-based Lock Preemption
+
+This section addresses a deliberate alpha limitation: alpha enforces lock
+compatibility, but does not provide any mechanism for a
+higher-priority Pod to break an incompatible lock.
+
+Standard Kubernetes preemption is blind to affinity locks. It triggers on
+*resource shortage* (insufficient CPU, memory, or device slots), not on
+qualitative state mismatch. This creates a critical gap:
+
+1. **Invisible shortage**: A NIC has 15/16 slots available but is locked to
+   Subnet-X. A high-priority Pod needs Subnet-Y. The scheduler sees plenty of
+   capacity → preemption is never triggered. The Pod is simply unschedulable.
+
+2. **Wrong victim selection**: Even if preemption were triggered by an unrelated
+   shortage, victim selection asks "which Pods free up slots?" not "which Pods
+   clear the lock?" The scheduler might preempt an unrelated Pod, freeing a
+   slot on a device still locked to the wrong subnet.
+
+3. **Permanent poisoning**: Without lock-aware preemption, a single low-priority
+   Pod can hold a high-capacity device hostage indefinitely.
+
+**Lock-aware preemption** (targeted for Beta) extends the scheduler's PostFilter
+phase:
+
+1. **Detection**: When a Pod fails Filter specifically due to
+   `SharingAffinityMismatch`, the PostFilter identifies the device and its
+   current lock-holder claims.
+2. **Evaluation**: It calculates the collective priority of all Pods holding
+   claims that share the lock. If the incoming Pod's priority exceeds the
+   group's maximum priority, preemption is viable.
+3. **Action**: The scheduler preempts all lock-holder Pods on the device,
+   releasing their claims and clearing the affinity lock. The device returns
+   to a clean state for the high-priority Pod.
+
+This is scoped for Beta because the core Filter/Reserve/Score mechanism must
+be proven in Alpha first, and lock-aware preemption requires careful
+integration with the existing DRA preemption path. Key design considerations
+include:
+
+- **Victim minimization**: When multiple devices could satisfy the incoming Pod,
+  the preemption logic should prefer the device with the fewest lock-holding
+  Pods to minimize disruption.
+- **Atomicity**: Preemption in Kubernetes is asynchronous—victim Pods are
+  deleted but do not disappear instantly. During the eviction window the old
+  lock is still active, so a newly-arriving compatible Pod could land on the
+  device and re-establish the lock, creating a preemption cascade. Standard
+  preemption solves the analogous problem with NominatedNode; lock-breaking
+  would need a similar mechanism (e.g., marking the device's lock as
+  "transitioning to the new value for the preempting Pod") so that future
+  scheduling cycles treat the device as locked to the new value, filtering
+  out Pods compatible only with the old lock.
+
+### SharingStrategy (`CanSetLock` / `NeverSetLock`)
+
+Alpha intentionally does not let claims control whether they may establish
+a new lock on a clean device. Any compatible claim can set the initial lock,
+and the scheduler then packs subsequent compatible claims onto that device.
+
+A future enhancement could add an explicit **SharingStrategy** on the claim
+side to control lock-setting behavior. Two candidate strategies are:
+
+- **`CanSetLock`** (default): The claim may land on a clean device and
+  establish the lock. This matches the alpha behavior.
+- **`NeverSetLock`**: The claim may only be allocated to a device that already
+  has a matching lock established by another claim. This is useful for
+  background or batch jobs that should never consume a clean device and
+  potentially fragment capacity. **Caveat**: `NeverSetLock` is a follower-only
+  strategy — it requires at least one `CanSetLock` claim to establish the lock
+  first. If no device is locked to the requested value, a `NeverSetLock` pod
+  will remain unschedulable indefinitely. Implementations should document this
+  dependency clearly and consider surfacing a scheduling event when a pod is
+  blocked waiting for a lock that no leader has established.
+
+If introduced in beta or later, the scheduler would evaluate this policy before
+capacity and key matching for unlocked devices. A claim with `NeverSetLock`
+would reject an unlocked device immediately, then continue searching for an
+already-locked compatible device.
+
+This is deferred from alpha to keep the initial scope focused on the core
+problem: driver-declared sharing constraints plus scheduler-enforced lock
+tracking via structured parameters.
+
+### Soft / Preferred Affinity Keys
+
+The Alpha design enforces hard all-or-nothing matching: all declared
+`parameterKeys` must match or the device is filtered out. Real-world hardware
+may have hierarchical constraints where some keys are strict sharing
+requirements (e.g., Subnet) and others are scheduling preferences (e.g.,
+Traffic-Class or bandwidth profile).
+
+A future enhancement could add a `required` vs `preferred` flag on individual
+entries in `parameterKeys`:
+
+- **`required`** (default): Mismatch → device filtered out (current behavior)
+- **`preferred`**: Mismatch → device passes Filter but receives a lower score
+
+This would allow the Score phase to optimize for Traffic-Class alignment while
+only enforcing hard locks on Subnet. The lock itself would only be set for
+`required` keys — `preferred` keys would remain advisory and never block
+scheduling. This avoids complicating the atomic lock model while still
+enabling soft optimization.
+
+### Typed Affinity Values Beyond Strings
+
+Alpha limits affinity matching to string equality (`map[string]string`), which
+covers all known use cases (subnets, bitstreams, partition keys). Non-string
+types could be added in the future if concrete use cases arise.
+
+## Infrastructure Needed
+
+None
diff --git a/keps/sig-scheduling/5981-dra-sharing-affinity/kep.yaml b/keps/sig-scheduling/5981-dra-sharing-affinity/kep.yaml
new file mode 100644
index 000000000000..938b7caee368
--- /dev/null
+++ b/keps/sig-scheduling/5981-dra-sharing-affinity/kep.yaml
@@ -0,0 +1,46 @@
+title: "DRA Sharing Affinity for Conditional Fungibility"
+kep-number: 5981
+authors:
+  - "@ashvindeodhar"
+owning-sig: "sig-scheduling"
+participating-sigs:
+  - "sig-node"
+status: "provisional"
+creation-date: "2026-03-30"
+reviewers:
+  - "@pohly"
+  - "@johnbelamaric"
+  - "@sunya-ch"
+  - "@ritazh"
+  - "@LionelJouin"
+approvers:
+  - TBD
+
+see-also:
+  - "/keps/sig-scheduling/5075-dra-consumable-capacity"
+  - "/keps/sig-node/4381-dra-structured-parameters"
+
+# The target maturity stage in the current dev cycle for this KEP.
+stage: alpha
+
+# The most recent milestone for which work toward delivery of this KEP has been done.
+latest-milestone: "v1.37"
+
+# The milestone at which this feature was, or is targeted to be, at each stage.
+milestone:
+  alpha: "v1.37"
+  beta: "v1.38"
+  stable: "v1.40"
+
+# The following PRR answers are required at alpha release
+feature-gates:
+  - name: DRASharingAffinity
+    components:
+      - kube-apiserver
+      - kube-scheduler
+      - kubelet
+disable-supported: true
+
+# Metrics required for beta release; can be placeholders for now.
+metrics:
+  - dra_scheduling_attempts_affinity_mismatch_total
\ No newline at end of file