Skip to content

fix: correct subtyping for sparse matrices in cuSPARSE#3105

Closed
gdalle wants to merge 1 commit intoJuliaGPU:masterfrom
gdalle:gd/subtyping
Closed

fix: correct subtyping for sparse matrices in cuSPARSE#3105
gdalle wants to merge 1 commit intoJuliaGPU:masterfrom
gdalle:gd/subtyping

Conversation

@gdalle
Copy link
Copy Markdown
Contributor

@gdalle gdalle commented Apr 16, 2026

Fixes #3104

maleadt
maleadt previously approved these changes Apr 16, 2026
@gdalle
Copy link
Copy Markdown
Contributor Author

gdalle commented Apr 16, 2026

Should I add tests?

@kshyatt
Copy link
Copy Markdown
Member

kshyatt commented Apr 16, 2026

Is this going to break the broadcasting tests, which rely on things being subtypes of the abstract GPUArrays types?

@gdalle
Copy link
Copy Markdown
Contributor Author

gdalle commented Apr 16, 2026

No, this shouldn't be breaking because AbstractCuSparseMatrix itself subtypes AbstractGPUSparseMatrix

@kshyatt
Copy link
Copy Markdown
Member

kshyatt commented Apr 16, 2026

That wasn't what I asked. Is it going to break this logic?

@gdalle
Copy link
Copy Markdown
Contributor Author

gdalle commented Apr 16, 2026

If I understand correctly, this line still applies to CuSparseMatrixCSC/CSR so all should be good?

https://github.com/JuliaGPU/GPUArrays.jl/blob/21d6bc6f3c3c8dc2a1d2dbd07c5eb177fdaf179d/src/host/sparse.jl#L290

@kshyatt
Copy link
Copy Markdown
Member

kshyatt commented Apr 16, 2026

I don't think everything will be fine because of this

@gdalle gdalle marked this pull request as draft April 16, 2026 11:50
@gdalle
Copy link
Copy Markdown
Contributor Author

gdalle commented Apr 16, 2026

Oh right I understand now. Yeah, this will break.
I opened #3104 because a recent version of CUDA broke SparseMatrixColorings, where I assumed that all CUDA sparse matrices are subtypes of AbstractCuSparseMatrix (some even have the subtyping in their docstring by mistake). If that is no longer true because the CSC/CSR distinction happens at the GPUArrays level and we prefer safeguarding that hierarchy, what is AbstractCuSparseMatrix even for?

@maleadt maleadt dismissed their stale review April 16, 2026 11:51

Missed that AbstractCuSparseMatrix doesn't have AbstractGPUSparseMatrixCSR in its hierarchy.

@kshyatt
Copy link
Copy Markdown
Member

kshyatt commented Apr 16, 2026

Right now it's kind of functioning as a dumping ground for things that aren't CSC and CSR 😅 . Ideally what we'd do is add broadcasting support for COO and BSR so we can fold them in to the wider AbstractGPUSparseMatrix*** system

@gdalle
Copy link
Copy Markdown
Contributor Author

gdalle commented Apr 16, 2026

Okay then feel free to close this PR, and I'll make a note to not use AbstractCuSparseMatrix anymore (see JuliaDiff/SparseMatrixColorings.jl#314)

@kshyatt
Copy link
Copy Markdown
Member

kshyatt commented Apr 16, 2026

Sorry for the confusion here! It's kind of an ongoing process, as you can see...

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUDA.jl Benchmarks

Details
Benchmark suite Current: 2c11d9a Previous: af6961a Ratio
array/accumulate/Float32/1d 101885 ns 101502.5 ns 1.00
array/accumulate/Float32/dims=1 77180 ns 76629.5 ns 1.01
array/accumulate/Float32/dims=1L 1586558 ns 1585691.5 ns 1.00
array/accumulate/Float32/dims=2 144533.5 ns 143840 ns 1.00
array/accumulate/Float32/dims=2L 658693.5 ns 657866.5 ns 1.00
array/accumulate/Int64/1d 119291 ns 119085 ns 1.00
array/accumulate/Int64/dims=1 80352 ns 79723 ns 1.01
array/accumulate/Int64/dims=1L 1695194.5 ns 1694532 ns 1.00
array/accumulate/Int64/dims=2 156647 ns 156114 ns 1.00
array/accumulate/Int64/dims=2L 962826 ns 961892 ns 1.00
array/broadcast 20872 ns 20629 ns 1.01
array/construct 1267 ns 1256.5 ns 1.01
array/copy 18257 ns 17886 ns 1.02
array/copyto!/cpu_to_gpu 218082 ns 212623 ns 1.03
array/copyto!/gpu_to_cpu 283779.5 ns 283142 ns 1.00
array/copyto!/gpu_to_gpu 10733 ns 10702 ns 1.00
array/iteration/findall/bool 135007 ns 134713 ns 1.00
array/iteration/findall/int 150383.5 ns 149574 ns 1.01
array/iteration/findfirst/bool 82540 ns 81484 ns 1.01
array/iteration/findfirst/int 84719 ns 83536 ns 1.01
array/iteration/findmin/1d 88840 ns 85967.5 ns 1.03
array/iteration/findmin/2d 117471 ns 116482 ns 1.01
array/iteration/logical 200391 ns 199753 ns 1.00
array/iteration/scalar 67244 ns 67105.5 ns 1.00
array/permutedims/2d 52813 ns 52568 ns 1.00
array/permutedims/3d 53243 ns 53057 ns 1.00
array/permutedims/4d 52187 ns 51868.5 ns 1.01
array/random/rand/Float32 13794 ns 12842 ns 1.07
array/random/rand/Int64 26019 ns 25396 ns 1.02
array/random/rand!/Float32 8899.666666666666 ns 8414.666666666666 ns 1.06
array/random/rand!/Int64 22129 ns 21981 ns 1.01
array/random/randn/Float32 37987 ns 37382.5 ns 1.02
array/random/randn!/Float32 30899 ns 30580 ns 1.01
array/reductions/mapreduce/Float32/1d 34669 ns 34346 ns 1.01
array/reductions/mapreduce/Float32/dims=1 39945 ns 39918.5 ns 1.00
array/reductions/mapreduce/Float32/dims=1L 51321 ns 51509 ns 1.00
array/reductions/mapreduce/Float32/dims=2 56515 ns 56563 ns 1.00
array/reductions/mapreduce/Float32/dims=2L 69450.5 ns 69236 ns 1.00
array/reductions/mapreduce/Int64/1d 43066 ns 42783 ns 1.01
array/reductions/mapreduce/Int64/dims=1 51270 ns 42869 ns 1.20
array/reductions/mapreduce/Int64/dims=1L 87296 ns 87545 ns 1.00
array/reductions/mapreduce/Int64/dims=2 59402 ns 59325 ns 1.00
array/reductions/mapreduce/Int64/dims=2L 84700 ns 85131 ns 0.99
array/reductions/reduce/Float32/1d 34994 ns 34338 ns 1.02
array/reductions/reduce/Float32/dims=1 49664 ns 48895 ns 1.02
array/reductions/reduce/Float32/dims=1L 51712 ns 51567 ns 1.00
array/reductions/reduce/Float32/dims=2 56672 ns 56622 ns 1.00
array/reductions/reduce/Float32/dims=2L 69967 ns 69625 ns 1.00
array/reductions/reduce/Int64/1d 42893 ns 42697 ns 1.00
array/reductions/reduce/Int64/dims=1 42295 ns 47176.5 ns 0.90
array/reductions/reduce/Int64/dims=1L 87293 ns 87361 ns 1.00
array/reductions/reduce/Int64/dims=2 59586 ns 59337 ns 1.00
array/reductions/reduce/Int64/dims=2L 84606 ns 84717 ns 1.00
array/reverse/1d 17873 ns 18051 ns 0.99
array/reverse/1dL 68581 ns 68628 ns 1.00
array/reverse/1dL_inplace 65762 ns 65835 ns 1.00
array/reverse/1d_inplace 10334 ns 8676.333333333334 ns 1.19
array/reverse/2d 20742 ns 20541 ns 1.01
array/reverse/2dL 72870 ns 72496 ns 1.01
array/reverse/2dL_inplace 65853 ns 65804 ns 1.00
array/reverse/2d_inplace 9986 ns 10104 ns 0.99
array/sorting/1d 2736964 ns 2734890.5 ns 1.00
array/sorting/2d 1070069 ns 1069333 ns 1.00
array/sorting/by 3305529.5 ns 3305442 ns 1.00
cuda/synchronization/context/auto 1143.3 ns 1142.4 ns 1.00
cuda/synchronization/context/blocking 923.5405405405405 ns 918.9428571428572 ns 1.01
cuda/synchronization/context/nonblocking 7128.8 ns 7080.9 ns 1.01
cuda/synchronization/stream/auto 1001.5714285714286 ns 983.3846153846154 ns 1.02
cuda/synchronization/stream/blocking 823.6436781609195 ns 819.060606060606 ns 1.01
cuda/synchronization/stream/nonblocking 7053.8 ns 7951.1 ns 0.89
integration/byval/reference 143725 ns 143938 ns 1.00
integration/byval/slices=1 145783 ns 145713 ns 1.00
integration/byval/slices=2 284767 ns 284653 ns 1.00
integration/byval/slices=3 422924 ns 422978 ns 1.00
integration/cudadevrt 102332 ns 102487 ns 1.00
integration/volumerhs 23447510 ns 23490453 ns 1.00
kernel/indexing 13255 ns 13409 ns 0.99
kernel/indexing_checked 13922 ns 14040 ns 0.99
kernel/launch 2139.4444444444443 ns 2267.4444444444443 ns 0.94
kernel/occupancy 706.8611111111111 ns 828.1463414634146 ns 0.85
kernel/rand 14503 ns 14466 ns 1.00
latency/import 3805709559 ns 3809615751 ns 1.00
latency/precompile 4569868964 ns 4560767959.5 ns 1.00
latency/ttfp 4380765383 ns 4401798424 ns 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@gdalle gdalle closed this Apr 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Wrong matrix subtyping in cuSPARSE

3 participants