Conversation
Review checklistThis checklist is meant to assist creators of PRs (to let them know what reviewers will typically look for) and reviewers (to guide them in a structured review process). Items do not need to be checked explicitly for a PR to be eligible for merging. Purpose and scope
Code quality
Documentation
Testing
Performance
Verification
Created with ❤️ by the Trixi.jl community. |
From |
|
I need to investigate why I am getting: |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2908 +/- ##
==========================================
- Coverage 97.11% 97.07% -0.05%
==========================================
Files 622 624 +2
Lines 48235 48264 +29
==========================================
+ Hits 46843 46848 +5
- Misses 1392 1416 +24
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
ranocha
left a comment
There was a problem hiding this comment.
Can you please add a (brief) section to the documentation, e.g., https://trixi-framework.github.io/TrixiDocumentation/stable/performance/, describing how to use these tools for benchmarking (or at least mentioning that Trixi.jl supports them and linking to other docs for further information)?
|
Please request my review when you've finished this PR. |
There was a problem hiding this comment.
It would be good to add at least a rudimentary comment on the purpose of this extension.
There was a problem hiding this comment.
It would be good to add at least a rudimentary comment on the purpose of this extension.
|
|
||
| # TODO: move to KernelAbstractions | ||
| """ | ||
| trixi_range_active(backend) |
There was a problem hiding this comment.
I'd call these three functions profiling_range_xxx instead of trixi_range_xxx to make them more self-descriptive.
… VTune Delay init of domain fixup: formatting add color
Co-authored-by: Michael Schlottke-Lakemper <michael@sloede.com>
- Rename `trixi_range_*` to `profiling_range_*` - Add descriptive comments to TrixiIntelITTExt and TrixiNVTXExt - Fix formatting at EOF in TrixiIntelITTExt and TrixiNVTXExt Co-authored-by: Antigravity <antigravity@gemini.google.com> Co-authored-by: Gemini 3.1 Pro (High) <gemini@google.com>
|
@benegee and @efaulhaber since you have done some profiling sessions with me recently maybe you can give the new docs a read. |
| return ncalls_first | ||
| end | ||
|
|
||
| # TODO: move to KernelAbstractions |
| You can also just use `CUDA.@profile` (see [Integrated Profiler](https://cuda.juliagpu.org/stable/development/profiling/#Integrated-profiler)) to obtain profiler results that include the NVTX ranges. | ||
|
|
||
| #### Known limitation | ||
| Nsight Systems can also be used for CPU and in particular MPI codes. The Trixi.jl extension will only be enabled when GPU backend is being used. |
There was a problem hiding this comment.
Just to make sure I understand this correctly: When you load CUDA and NXTX it will trigger loading the TrixiNVTXExt extension. (And you can also use CUDA.@Profile.) But you will not get range annotations unless you are actually running on the CUDA backend.
efaulhaber
left a comment
There was a problem hiding this comment.
Why is this not part of TrixiBase? Why is @trixi_timeit_ext not part of TrixiBase either? We are currently using @trixi_timeit with manual synchronization of all kernels in TrixiParticles, so it would be very nice for us to have this in TrixiBase.
|
|
Why is this not part of (an extension of) TrixiBase.jl? |
|
It should definitely live somewhere in a shared package. If not in TrixiBase, then we should create a new package for advanced timings. |
TrixiBase currently has no semantics for what a backend is/what that argument could mean. There was an explicit wish for now to not have a dependency on KernelAbstraction, and as I said in #2892 (comment) it is very weird and fragile to me to make a macro dependent on an extension. So I am not pursuing an inclusion in TrixiBase currently because we are still figuring out what is the right way to do backend selection and parallelism, and it is much harder to do that in a package where I have to guarantee an API versus the internal implementation for Trixi.jl |
For GPU-accelerated development we often use external profilers, such as NSight System.
With this PR we automatically annotate and then get inside NSight System: