fix: correct subtyping for sparse matrices in cuSPARSE#3105
fix: correct subtyping for sparse matrices in cuSPARSE#3105gdalle wants to merge 1 commit intoJuliaGPU:masterfrom
Conversation
|
Should I add tests? |
|
Is this going to break the broadcasting tests, which rely on things being subtypes of the abstract GPUArrays types? |
|
No, this shouldn't be breaking because |
|
That wasn't what I asked. Is it going to break this logic? |
|
If I understand correctly, this line still applies to |
|
I don't think everything will be fine because of this |
|
Oh right I understand now. Yeah, this will break. |
Missed that AbstractCuSparseMatrix doesn't have AbstractGPUSparseMatrixCSR in its hierarchy.
|
Right now it's kind of functioning as a dumping ground for things that aren't CSC and CSR 😅 . Ideally what we'd do is add broadcasting support for COO and BSR so we can fold them in to the wider AbstractGPUSparseMatrix*** system |
|
Okay then feel free to close this PR, and I'll make a note to not use |
|
Sorry for the confusion here! It's kind of an ongoing process, as you can see... |
There was a problem hiding this comment.
CUDA.jl Benchmarks
Details
| Benchmark suite | Current: 2c11d9a | Previous: af6961a | Ratio |
|---|---|---|---|
array/accumulate/Float32/1d |
101885 ns |
101502.5 ns |
1.00 |
array/accumulate/Float32/dims=1 |
77180 ns |
76629.5 ns |
1.01 |
array/accumulate/Float32/dims=1L |
1586558 ns |
1585691.5 ns |
1.00 |
array/accumulate/Float32/dims=2 |
144533.5 ns |
143840 ns |
1.00 |
array/accumulate/Float32/dims=2L |
658693.5 ns |
657866.5 ns |
1.00 |
array/accumulate/Int64/1d |
119291 ns |
119085 ns |
1.00 |
array/accumulate/Int64/dims=1 |
80352 ns |
79723 ns |
1.01 |
array/accumulate/Int64/dims=1L |
1695194.5 ns |
1694532 ns |
1.00 |
array/accumulate/Int64/dims=2 |
156647 ns |
156114 ns |
1.00 |
array/accumulate/Int64/dims=2L |
962826 ns |
961892 ns |
1.00 |
array/broadcast |
20872 ns |
20629 ns |
1.01 |
array/construct |
1267 ns |
1256.5 ns |
1.01 |
array/copy |
18257 ns |
17886 ns |
1.02 |
array/copyto!/cpu_to_gpu |
218082 ns |
212623 ns |
1.03 |
array/copyto!/gpu_to_cpu |
283779.5 ns |
283142 ns |
1.00 |
array/copyto!/gpu_to_gpu |
10733 ns |
10702 ns |
1.00 |
array/iteration/findall/bool |
135007 ns |
134713 ns |
1.00 |
array/iteration/findall/int |
150383.5 ns |
149574 ns |
1.01 |
array/iteration/findfirst/bool |
82540 ns |
81484 ns |
1.01 |
array/iteration/findfirst/int |
84719 ns |
83536 ns |
1.01 |
array/iteration/findmin/1d |
88840 ns |
85967.5 ns |
1.03 |
array/iteration/findmin/2d |
117471 ns |
116482 ns |
1.01 |
array/iteration/logical |
200391 ns |
199753 ns |
1.00 |
array/iteration/scalar |
67244 ns |
67105.5 ns |
1.00 |
array/permutedims/2d |
52813 ns |
52568 ns |
1.00 |
array/permutedims/3d |
53243 ns |
53057 ns |
1.00 |
array/permutedims/4d |
52187 ns |
51868.5 ns |
1.01 |
array/random/rand/Float32 |
13794 ns |
12842 ns |
1.07 |
array/random/rand/Int64 |
26019 ns |
25396 ns |
1.02 |
array/random/rand!/Float32 |
8899.666666666666 ns |
8414.666666666666 ns |
1.06 |
array/random/rand!/Int64 |
22129 ns |
21981 ns |
1.01 |
array/random/randn/Float32 |
37987 ns |
37382.5 ns |
1.02 |
array/random/randn!/Float32 |
30899 ns |
30580 ns |
1.01 |
array/reductions/mapreduce/Float32/1d |
34669 ns |
34346 ns |
1.01 |
array/reductions/mapreduce/Float32/dims=1 |
39945 ns |
39918.5 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=1L |
51321 ns |
51509 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=2 |
56515 ns |
56563 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=2L |
69450.5 ns |
69236 ns |
1.00 |
array/reductions/mapreduce/Int64/1d |
43066 ns |
42783 ns |
1.01 |
array/reductions/mapreduce/Int64/dims=1 |
51270 ns |
42869 ns |
1.20 |
array/reductions/mapreduce/Int64/dims=1L |
87296 ns |
87545 ns |
1.00 |
array/reductions/mapreduce/Int64/dims=2 |
59402 ns |
59325 ns |
1.00 |
array/reductions/mapreduce/Int64/dims=2L |
84700 ns |
85131 ns |
0.99 |
array/reductions/reduce/Float32/1d |
34994 ns |
34338 ns |
1.02 |
array/reductions/reduce/Float32/dims=1 |
49664 ns |
48895 ns |
1.02 |
array/reductions/reduce/Float32/dims=1L |
51712 ns |
51567 ns |
1.00 |
array/reductions/reduce/Float32/dims=2 |
56672 ns |
56622 ns |
1.00 |
array/reductions/reduce/Float32/dims=2L |
69967 ns |
69625 ns |
1.00 |
array/reductions/reduce/Int64/1d |
42893 ns |
42697 ns |
1.00 |
array/reductions/reduce/Int64/dims=1 |
42295 ns |
47176.5 ns |
0.90 |
array/reductions/reduce/Int64/dims=1L |
87293 ns |
87361 ns |
1.00 |
array/reductions/reduce/Int64/dims=2 |
59586 ns |
59337 ns |
1.00 |
array/reductions/reduce/Int64/dims=2L |
84606 ns |
84717 ns |
1.00 |
array/reverse/1d |
17873 ns |
18051 ns |
0.99 |
array/reverse/1dL |
68581 ns |
68628 ns |
1.00 |
array/reverse/1dL_inplace |
65762 ns |
65835 ns |
1.00 |
array/reverse/1d_inplace |
10334 ns |
8676.333333333334 ns |
1.19 |
array/reverse/2d |
20742 ns |
20541 ns |
1.01 |
array/reverse/2dL |
72870 ns |
72496 ns |
1.01 |
array/reverse/2dL_inplace |
65853 ns |
65804 ns |
1.00 |
array/reverse/2d_inplace |
9986 ns |
10104 ns |
0.99 |
array/sorting/1d |
2736964 ns |
2734890.5 ns |
1.00 |
array/sorting/2d |
1070069 ns |
1069333 ns |
1.00 |
array/sorting/by |
3305529.5 ns |
3305442 ns |
1.00 |
cuda/synchronization/context/auto |
1143.3 ns |
1142.4 ns |
1.00 |
cuda/synchronization/context/blocking |
923.5405405405405 ns |
918.9428571428572 ns |
1.01 |
cuda/synchronization/context/nonblocking |
7128.8 ns |
7080.9 ns |
1.01 |
cuda/synchronization/stream/auto |
1001.5714285714286 ns |
983.3846153846154 ns |
1.02 |
cuda/synchronization/stream/blocking |
823.6436781609195 ns |
819.060606060606 ns |
1.01 |
cuda/synchronization/stream/nonblocking |
7053.8 ns |
7951.1 ns |
0.89 |
integration/byval/reference |
143725 ns |
143938 ns |
1.00 |
integration/byval/slices=1 |
145783 ns |
145713 ns |
1.00 |
integration/byval/slices=2 |
284767 ns |
284653 ns |
1.00 |
integration/byval/slices=3 |
422924 ns |
422978 ns |
1.00 |
integration/cudadevrt |
102332 ns |
102487 ns |
1.00 |
integration/volumerhs |
23447510 ns |
23490453 ns |
1.00 |
kernel/indexing |
13255 ns |
13409 ns |
0.99 |
kernel/indexing_checked |
13922 ns |
14040 ns |
0.99 |
kernel/launch |
2139.4444444444443 ns |
2267.4444444444443 ns |
0.94 |
kernel/occupancy |
706.8611111111111 ns |
828.1463414634146 ns |
0.85 |
kernel/rand |
14503 ns |
14466 ns |
1.00 |
latency/import |
3805709559 ns |
3809615751 ns |
1.00 |
latency/precompile |
4569868964 ns |
4560767959.5 ns |
1.00 |
latency/ttfp |
4380765383 ns |
4401798424 ns |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
Fixes #3104