๐ Profile:lzyrapx on LeetGPU | ๐ฏ Challenges: LeetGPU Challenges
Progress Summary: Actively conquering GPU programming challenges across multiple frameworks. Currently focusing heavily on CUDA and PyTorch, with ongoing explorations into modern compilers and languages like Triton, Mojo, and TinyGrad.
Core BLAS operations, matrix manipulation, and quantized variations.
| Problems | CUDA | PyTorch | Triton | Mojo | TinyGrad | Cute DSL |
|---|---|---|---|---|---|---|
| Batched Matrix Multiplication | โ | โ | ||||
| Dot Product | โ | โ | ||||
| FP16 Batched Matrix Multiplication | โ | |||||
| FP16 Dot Product | โ | |||||
| GEMM (FP16) | โ | โ | ||||
| INT8 Quantized MatMul | โ | โ | ||||
| Matrix Addition | โ | |||||
| Matrix Copy | โ | โ | โ | |||
| Matrix Multiplication | โ | โ | โ | โ | โ | โ |
| Matrix Power | โ | |||||
| Matrix Transpose | โ | โ | โ | โ | โ | |
| Sparse Matrix-Vector Multiplication | โ | โ |
Attention mechanisms, normalizations, activations, and modern LLM kernels.
| Problems | CUDA | PyTorch | Triton | Mojo | TinyGrad | Cute DSL |
|---|---|---|---|---|---|---|
| Attention with Linear Biases | โ | |||||
| Batch Normalization | โ | |||||
| Categorical Cross Entropy Loss | โ | โ | ||||
| Gaussian Error Gated Linear Unit | โ | |||||
| Leaky ReLU | โ | โ | โ | |||
| Linear Self-Attention | โ | |||||
| LoRA Linear | โ | |||||
| Mean Squared Error | โ | โ | ||||
| Multi-Head Self-Attention | โ | |||||
| ReLU | โ | โ | โ | |||
| RMS Normalization | โ | |||||
| Rotary Positional Embedding | โ | |||||
| Sigmoid Activation | โ | |||||
| Sigmoid Linear Unit | โ | |||||
| Simple Inference | โ | |||||
| Sliding Window Self-Attention | โ | |||||
| Softmax | โ | โ | ||||
| Softmax Attention | โ | โ | ||||
| Swish-Gated Linear Unit | โ | |||||
| Weight Dequantization | โ |
Filtering, FFT, max pooling, and spatial transformations.
| Problems | CUDA | PyTorch | Triton | Mojo | TinyGrad | Cute DSL |
|---|---|---|---|---|---|---|
| 1D Convolution | โ | โ | โ | โ | โ | |
| 2D Convolution | โ | โ | ||||
| 2D Max Pooling | โ | |||||
| 3D Convolution | โ | |||||
| Color Inversion | โ | โ | โ | โ | ||
| Fast Fourier Transform | โ | |||||
| Gaussian Blur | โ | โ | ||||
| RGB to Grayscale | โ |
Parallel reductions, prefix sums, sorting, and array manipulations.
| Problems | CUDA | PyTorch | Triton | Mojo | TinyGrad | Cute DSL |
|---|---|---|---|---|---|---|
| 2D Subarray Sum | โ | |||||
| 3D Subarray Sum | โ | |||||
| Count Array Element | โ | โ | ||||
| Count 2D Array Element | โ | โ | ||||
| Count 3D Array Element | โ | |||||
| Histogramming | โ | โ | ||||
| Interleave Arrays | โ | |||||
| Max Subarray Sum | โ | |||||
| Merge Sorted Arrays | โ | |||||
| Parallel Merge | โ | |||||
| Prefix Sum | โ | โ | ||||
| Radix Sort | โ | โ | ||||
| Reduction | โ | โ | ||||
| Reverse Array | โ | โ | โ | |||
| Sorting | โ | โ | ||||
| Subarray Sum | โ | |||||
| Top-K Selection | โ | โ | ||||
| Value Clipping | โ | |||||
| Vector Addition | โ | โ | โ | โ | โ | โ |
Stencils, regressions, graph traversal, and simulation algorithms.
| Problems | CUDA | PyTorch | Triton | Mojo | TinyGrad | Cute DSL |
|---|---|---|---|---|---|---|
| 2D Jacobi Stencil | โ | |||||
| All-Pairs Shortest Paths | โ | |||||
| BFS Shortest Path | โ | |||||
| K-Means Clustering | โ | |||||
| Linear Recurrence | โ | |||||
| Logistic Regression | โ | โ | ||||
| Monte Carlo Integration | โ | โ | โ | |||
| Multi-Agent Simulation | โ | |||||
| Nearest Neighbor | โ | |||||
| Ordinary Least Squares | โ | โ | ||||
| Password Cracking | โ | |||||
| Rainbow Table | โ | โ | โ |