cuda: sanitize invalid Blackwell sharedMemPerBlockOptin#24991
Open
wgu9 wants to merge 1 commit into
Open
Conversation
Contributor
|
@ggml-org/nvidia there have been multiple PRs which attempt to "fix" this issue. I'm now wondering if this is a real issue |
Author
|
Thanks for calling that out. I agree this should not merge unless the device-property issue is real and this PR is not just another speculative Blackwell workaround. What I verified before opening this:
If the NVIDIA maintainers think the driver/device report should be treated as impossible or fixed lower in the stack, I am fine closing this. The intent here is only to add a narrow defensive guard around an invalid device property, not to mask unrelated Blackwell issues. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Some Blackwell CUDA driver/device combinations can report an invalid
sharedMemPerBlockOptinvalue. Sanitize that value during CUDA device initialization and fall back tosharedMemPerBlockwhen the opt-in value is zero or larger thansharedMemPerMultiprocessor.Validation:
test-backend-ops CUDA0 MUL_MATpassed 1134/1134