From c041defba84b5addcdb7a87ab3a51860ba9d932d Mon Sep 17 00:00:00 2001 From: Simone Gasparini Date: Tue, 26 Aug 2025 18:36:10 +0200 Subject: [PATCH 1/4] first draft for AI coding conventions --- .github/copilot-instructions.md | 1 + AI_DEVELOPMENT_GUIDE.md | 107 ++++++++++++++++++++++++++++++++ 2 files changed, 108 insertions(+) create mode 100644 .github/copilot-instructions.md create mode 100644 AI_DEVELOPMENT_GUIDE.md diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md new file mode 100644 index 00000000..3f010848 --- /dev/null +++ b/.github/copilot-instructions.md @@ -0,0 +1 @@ +See [AI_DEVELOPMENT_GUIDE.md](../AI_DEVELOPMENT_GUIDE.md) for full coding conventions. \ No newline at end of file diff --git a/AI_DEVELOPMENT_GUIDE.md b/AI_DEVELOPMENT_GUIDE.md new file mode 100644 index 00000000..29854d4b --- /dev/null +++ b/AI_DEVELOPMENT_GUIDE.md @@ -0,0 +1,107 @@ +# AI Development Guide for PopSift + +This guide defines how AI-assisted code generation should be done in this repository. +It ensures that contributions (from GitHub Copilot, ChatGPT, Claude, etc.) follow a **consistent, modern, and maintainable style**. + +--- + +## General Principles + +- Always prioritize **readability** and **clarity** over micro-optimizations. +- Follow **modern C++17 best practices**. +- Keep host-side C++ and CUDA device code **cleanly separated**. +- Prefer **modularity**: each class or major component should live in its own file. +- Code should be **self-documenting** whenever possible, with clear naming and structure. + +--- + +## C++ Guidelines + +- **Standard**: Use **C++17**. Prefer `constexpr`, `auto`, `enum class`, range-based for loops, and smart pointers (`std::unique_ptr`, `std::shared_ptr`). +- **Memory Management**: Use RAII. Avoid raw `new`/`delete` except in CUDA contexts where unavoidable. +- **Error Handling**: + - Use exceptions in host C++ code. + - In CUDA, check and propagate error codes using helper utilities/macros. Never ignore errors. +- **Namespaces**: Group related functions/classes logically. Avoid polluting the global namespace. +- **Headers**: + - Keep headers minimal; forward declare instead of including heavy dependencies. + - Each header should be guarded with `#pragma once`. +- **Style**: + - `snake_case` for variables and functions. + - `CamelCase` for class and struct names. + - `ALL_CAPS` for macros and compile-time constants. + +--- + +## CUDA Guidelines + +- Separate **kernels** from host orchestration code. +- Name kernels descriptively, e.g. `compute_gradient_kernel`. +- Document assumptions about: + - Thread/block layout + - Shared memory usage + - Synchronization requirements +- Use `__restrict__` and `constexpr` where appropriate for performance and clarity. +- Prefer small, focused kernels over overly complex ones. +- Always validate CUDA API calls. + +--- + +## Threading Guidelines + +- **Host Threading**: Use `std::thread` and synchronization primitives from ``. +- **CUDA Streams**: Use multiple streams for concurrent kernel execution. +- **Thread Safety**: Document thread safety guarantees for all public APIs. +- **Avoid**: Raw pthreads or platform-specific threading APIs. + +--- + +## Modularity and Organization + +- Keep code **organized by functionality** (e.g., detection, description, GPU utilities). +- Avoid very long functions (>50 lines); refactor into helpers when possible. +- Prefer **free functions** in namespaces over singletons or unnecessary wrapper classes. +- Keep algorithms and data structures reusable when possible. + +--- + +## Performance Guidelines + +- **Memory Access Patterns**: Prefer coalesced memory access in CUDA kernels. Document stride patterns. +- **Shared Memory**: Use shared memory for data reuse within thread blocks. Document bank conflicts. +- **Register Usage**: Monitor register pressure in kernels. Aim for high occupancy. +- **Asynchronous Operations**: Use CUDA streams for overlapping computation and memory transfers. +- **Profiling**: Profile with `nvprof` or Nsight before optimizing. Document performance assumptions. +- **Memory Bandwidth**: Consider memory bandwidth as the primary bottleneck for most kernels. + +--- + +## Documentation + +- Use **Doxygen-style comments** for public APIs, classes, and CUDA kernels. +- Document algorithm choices and any CUDA-specific design tradeoffs. +- Update examples and README when new features are introduced. +- At each update ensure that the changelog is also updated following the [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) format. + - for each new feature, bug fix, or breaking change, add a corresponding entry in the changelog. + - the description should be short but informative, followed by the relevant PR link. + +--- + +## Git Guidelines + +- **Branch Names**: `feature/description`, `fix/issue-number`, `refactor/component` +- **Commit Messages**: Use conventional commits format: `[feat]`, `[fix]`, `[refactor]`, `[doc]` etc. +- **File Organization**: Keep related files in logical directories +- **Ignore Patterns**: Update `.gitignore` for build artifacts and IDE files + +--- + +## Commit & PR Guidelines + +- Keep commits small and focused (one feature or fix per commit). +- Do not commit untracked files that are not relevant. +- PRs should include: + - Clear description of changes + - Explanations for algorithmic choices or CUDA-specific design decisions + - Updated tests or examples if applicable +- Code must pass existing CI checks before merging. From e38c5afd18e4f8bfe2d341c4014477b7457e59fa Mon Sep 17 00:00:00 2001 From: Simone Gasparini Date: Wed, 27 Aug 2025 09:02:41 +0200 Subject: [PATCH 2/4] Add markdownlint configuration file --- .markdownlint.json | 6 ++++++ 1 file changed, 6 insertions(+) create mode 100644 .markdownlint.json diff --git a/.markdownlint.json b/.markdownlint.json new file mode 100644 index 00000000..95b2714e --- /dev/null +++ b/.markdownlint.json @@ -0,0 +1,6 @@ +{ + "default": true, + "MD013": false, + "MD024": false, + "MD033": false +} From 936bcdeb3b00a05938cdfd273b3d8f71cf505633 Mon Sep 17 00:00:00 2001 From: Carsten Griwodz Date: Fri, 29 Aug 2025 14:40:05 +0200 Subject: [PATCH 3/4] some initial thoughts on the AI guide --- AI_DEVELOPMENT_GUIDE.md | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/AI_DEVELOPMENT_GUIDE.md b/AI_DEVELOPMENT_GUIDE.md index 29854d4b..db113654 100644 --- a/AI_DEVELOPMENT_GUIDE.md +++ b/AI_DEVELOPMENT_GUIDE.md @@ -9,7 +9,8 @@ It ensures that contributions (from GitHub Copilot, ChatGPT, Claude, etc.) follo - Always prioritize **readability** and **clarity** over micro-optimizations. - Follow **modern C++17 best practices**. -- Keep host-side C++ and CUDA device code **cleanly separated**. +- Keep device-side __global__ functions in the same source file as the host-side C++ code that starts this kernel. +- Always compile __device__ functions with the functions that call them. Preferably declare them static inline. - Prefer **modularity**: each class or major component should live in its own file. - Code should be **self-documenting** whenever possible, with clear naming and structure. @@ -17,7 +18,12 @@ It ensures that contributions (from GitHub Copilot, ChatGPT, Claude, etc.) follo ## C++ Guidelines -- **Standard**: Use **C++17**. Prefer `constexpr`, `auto`, `enum class`, range-based for loops, and smart pointers (`std::unique_ptr`, `std::shared_ptr`). +- **Standard**: + - Use **C++17**. Prefer `constexpr`, `auto` and `enum class`. + - Use range-based for loops on the host side. + - Use smart pointers (`std::unique_ptr`, `std::shared_ptr`) on the host side. + - Never pass smart pointers as parameters to __global__ functions. + - Avoid dynamic memory allocation on the device side. - **Memory Management**: Use RAII. Avoid raw `new`/`delete` except in CUDA contexts where unavoidable. - **Error Handling**: - Use exceptions in host C++ code. From 0c5fdf251a110090fa12eeb181ae2ae010794bae Mon Sep 17 00:00:00 2001 From: Carsten Griwodz Date: Mon, 1 Sep 2025 17:00:12 +0200 Subject: [PATCH 4/4] some more CUDA calling considerations --- AI_DEVELOPMENT_GUIDE.md | 22 +++++++++++++++++----- 1 file changed, 17 insertions(+), 5 deletions(-) diff --git a/AI_DEVELOPMENT_GUIDE.md b/AI_DEVELOPMENT_GUIDE.md index db113654..5416d3f8 100644 --- a/AI_DEVELOPMENT_GUIDE.md +++ b/AI_DEVELOPMENT_GUIDE.md @@ -22,16 +22,23 @@ It ensures that contributions (from GitHub Copilot, ChatGPT, Claude, etc.) follo - Use **C++17**. Prefer `constexpr`, `auto` and `enum class`. - Use range-based for loops on the host side. - Use smart pointers (`std::unique_ptr`, `std::shared_ptr`) on the host side. + - Dynamic memory allocation on the device side is strongly discouraged. - Never pass smart pointers as parameters to __global__ functions. - - Avoid dynamic memory allocation on the device side. -- **Memory Management**: Use RAII. Avoid raw `new`/`delete` except in CUDA contexts where unavoidable. +- **Memory Management**: + - Use RAII on the host side. + - Avoid all dynamic memory allocation on the device side. + - Understand that reference-counting smart pointers cannot be kept consistent between + host and device, and that kernels run asynchronously from host code. - **Error Handling**: - Use exceptions in host C++ code. - In CUDA, check and propagate error codes using helper utilities/macros. Never ignore errors. - **Namespaces**: Group related functions/classes logically. Avoid polluting the global namespace. - **Headers**: - Keep headers minimal; forward declare instead of including heavy dependencies. - - Each header should be guarded with `#pragma once`. + However, small helper functions declared `static inline __device__` use several times should be + included instead of copying the code. + - Each header should be guarded with `#pragma once`. ifndef/endif guards should be used in special + circumstances only. - **Style**: - `snake_case` for variables and functions. - `CamelCase` for class and struct names. @@ -41,14 +48,19 @@ It ensures that contributions (from GitHub Copilot, ChatGPT, Claude, etc.) follo ## CUDA Guidelines -- Separate **kernels** from host orchestration code. +- Separate **kernels** (`__global__` functions) from host orchestration code, but keep + them in the same module as the host core that starts them. - Name kernels descriptively, e.g. `compute_gradient_kernel`. - Document assumptions about: - Thread/block layout - Shared memory usage - Synchronization requirements - Use `__restrict__` and `constexpr` where appropriate for performance and clarity. -- Prefer small, focused kernels over overly complex ones. +- Avoid writing kernels that use `local memory`, limit variables to registers and shared + memory as much as possible. To achieve this, prefer focused kernels over complex ones. +- To structure larger kernels, use `__device__` functions that are declared + `static inline __device__`. Ensure that caller and device functions are compiled together. +- Avoid dynamic parallelism. - Always validate CUDA API calls. ---