Skip to content

Prevent init_mpi from being automatically called during precompilation.#2993

Open
vchuravy wants to merge 1 commit intomainfrom
vc/guard_init_mpi
Open

Prevent init_mpi from being automatically called during precompilation.#2993
vchuravy wants to merge 1 commit intomainfrom
vc/guard_init_mpi

Conversation

@vchuravy
Copy link
Copy Markdown
Member

@vchuravy vchuravy commented May 4, 2026

Ideally we would avoid all precompilation under MPI, and that is our
recommendation to users. Yet, we have scene the situation in CI, where
an extension of Trixi gets recompiled due to a difference in flags,
leading to crashes and hangs.

See #2910 (comment) for an example
of that.

Ideally we would avoid all precompilation under MPI, and that is our
recommendation to users. Yet, we have scene the situation in CI, where
an extension of Trixi gets recompiled due to a difference in flags,
leading to crashes and hangs.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 4, 2026

Review checklist

This checklist is meant to assist creators of PRs (to let them know what reviewers will typically look for) and reviewers (to guide them in a structured review process). Items do not need to be checked explicitly for a PR to be eligible for merging.

Purpose and scope

  • The PR has a single goal that is clear from the PR title and/or description.
  • All code changes represent a single set of modifications that logically belong together.
  • No more than 500 lines of code are changed or there is no obvious way to split the PR into multiple PRs.

Code quality

  • The code can be understood easily.
  • Newly introduced names for variables etc. are self-descriptive and consistent with existing naming conventions.
  • There are no redundancies that can be removed by simple modularization/refactoring.
  • There are no leftover debug statements or commented code sections.
  • The code adheres to our conventions and style guide, and to the Julia guidelines.

Documentation

  • New functions and types are documented with a docstring or top-level comment.
  • Relevant publications are referenced in docstrings (see example for formatting).
  • Inline comments are used to document longer or unusual code sections.
  • Comments describe intent ("why?") and not just functionality ("what?").
  • If the PR introduces a significant change or new feature, it is documented in NEWS.md with its PR number.

Testing

  • The PR passes all tests.
  • New or modified lines of code are covered by tests.
  • New or modified tests run in less then 10 seconds.

Performance

  • There are no type instabilities or memory allocations in performance-critical parts.
  • If the PR intent is to improve performance, before/after time measurements are posted in the PR.

Verification

  • The correctness of the code was verified using appropriate tests.
  • If new equations/methods are added, a convergence test has been run and the results
    are posted in the PR.

Created with ❤️ by the Trixi.jl community.

Copy link
Copy Markdown
Member

@sloede sloede left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:jl_generating_output is essentially for free, right? Meaning, this will not cost us anything in terms of I/O, compute time etc. that would be annoying when done on 10k ranks in parallel.

Also, is a (reasonable) situation possible where only a subset of ranks might trigger "precompiling = yes", subsequently causing hangs because global (in the MPI sense) init operations are not executed on all ranks? Or would these situations already be causes for other types of crashes?

In practice, crashing is not so bad (just annoying) - really bad is 10k ranks job running for 12 hours and being stuck in initialization...

@codecov
Copy link
Copy Markdown

codecov Bot commented May 4, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.13%. Comparing base (c5a93b1) to head (5e01bd9).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2993   +/-   ##
=======================================
  Coverage   97.13%   97.13%           
=======================================
  Files         625      625           
  Lines       48514    48516    +2     
=======================================
+ Hits        47122    47124    +2     
  Misses       1392     1392           
Flag Coverage Δ
unittests 97.13% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@vchuravy
Copy link
Copy Markdown
Member Author

vchuravy commented May 5, 2026

essentially for free, right?

Yeah it's a cheap check.

Also, is a (reasonable) situation possible where only a subset of ranks might trigger "precompiling = yes", subsequently causing hangs because global (in the MPI sense) init operations are not executed on all ranks? Or would these situations already be causes for other types of crashes?

That's not possible. The point is we want to initialize MPI only during the actual run not in the precompilation process that we happened to create.

In practice, crashing is not so bad (just annoying) - really bad is 10k ranks job running for 12 hours and being stuck in initialization...

In practice, the current state this lead to hangs, it may lead to crashes. On CI we have seen it cause hangs on Ubuntu and Windows and only on Macos we got a crash.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants