ENH limit threads for C-Libraries dynamically#135
Conversation
d177ba6 to
1c48f8c
Compare
|
After investigating, I could not find a way to dynamically set the number of threads used by |
672be0e to
321428a
Compare
|
There might be some dynamic API to control the Grand Central Dispatch runtime of macOS but I am not sure. Anyway I think it's much less used than MKL, OpenBLAS and OpenMP in our cross-platform ecosystem. Let's keep macOS specific things for later PRs if users request it. |
ogrisel
left a comment
There was a problem hiding this comment.
Some comments but otherwise LGTM!
|
We might also want to add a CI entry (on travis) that This does not require importing numpy. The MKL download is 220MB though so I would not put it as a default test dependency, only for one of the build matrix entry. |
|
Also, a general style nitpick. I don't like the various OpenBLAS / openMP / MKL mixed case in python variables, function names and constant. I would rather use openblas / openmp / mkl (or the all-uppercase versions) consistently in our code :) |
74e4b43 to
35f44e4
Compare
|
I am starting to think we should not support non-POSIX for this feature, it is a nightmare with OSX and windows as ctypes is not finding the library and it is not fun to debug... We could state that we do not support dynamic scaling for these platforms, and fallback to |
That's an option, at least for a start. In this case |
There was a problem hiding this comment.
A potential limitation of this code is that we can only call the *_s/get_num_threads function once the Python package that relies on those C-runtimes has been important in the Python process.
For instance in OpenBLAS, you can observe the following:
$ python -c "from loky.backend.utils import get_thread_limits; print(get_thread_limits())"
{'openblas': None, 'openmp_intel': None, 'openmp_gnu': 12, 'mkl': None}
$ python -c "import numpy; from loky.backend.utils import get_thread_limits; print(get_thread_limits())"
{'openblas': 12, 'openmp_intel': None, 'openmp_gnu': 12, 'mkl': None}
Both in loky and joblib it's hard to tell which Python packages should actually be imported in the process workers ahead of time: it depends on the tasks being scheduled for execution.
This means that the best time to runtime helper function is after the task has been unpickled on the worker (triggering compiled module imports) but before actually executing the function.
What I do not understand in the above code is why then GNU OpenMP could be detected: I did not important any compiled extensions built with the -fopenmp gcc compiler flag and I doubt that the Python interpreter it-self is built with this flag.
ogrisel
left a comment
There was a problem hiding this comment.
Here is another batch of comments and questions:
I think that if we do not find the loaded library, we can fallback to setting the env variable that controls the maximal number of thread. So the initializer should be something like: This way, we would have the correct behavior without needing to predict which library will be used, on all platform. The rest of the API is for rescaling the threadpool and in this case, it only needs to change the number of loaded library. |
|
Indeed, good point. |
|
Could you try to add a travis CI entry with anaconda numpy + MKL and another with conda-forge numpy + openblas? |
|
Ok the dynamic library loading is now working in OSX. |
|
Ok I just found this SO answer which show how to use |
6bf5489 to
302926c
Compare
ogrisel
left a comment
There was a problem hiding this comment.
Some more refactoring suggestions:
| for name, info in SUPPORTED_IMPLEMENTATION.items(): | ||
| if self.starts_with_any(module_name, info['filename_prefixes']): | ||
| return name, info['library'] | ||
| return None, None |
There was a problem hiding this comment.
_is_supported_implementation could be renamed to _get_library_info_for_path and just return the full info dict with the name and the APIs info in it or None.
To make this simpler we could rewrite:
SUPPORTED_IMPLEMENTATION = {
"openmp_intel": {
"filename_prefixes": ("libiomp",),
"internal_api": "openmp"
"user_api": "openmp",
},
...
}to:
SUPPORTED_IMPLEMENTATION = [
{
"name": "openmp_intel",
"filename_prefixes": ("libiomp",),
"internal_api": "openmp"
"user_api": "openmp",
},
...
]There was a problem hiding this comment.
Should we keep the different names openmp_msvc, openmp_gnu,... or merge them as openmp with multiple filename_prefix?
I think the later makes more sense with our current implementation.
There was a problem hiding this comment.
I thought about that. I am fine with both as long as we are not able to retrieve the version number.
There was a problem hiding this comment.
Seems good. In case of nested parallelism BLAS inside prange, setting the number of threads for the inner BLAS can be done through the "blas" user-api, even if BLAS is linked to OpenMP. So there should never be a case where we need to explicitly restrict one OpenMP and not the other.
I am fine with both as long as we are not able to retrieve the version number.
OpenMP does not exposes it's version so I would not worry about that.
There was a problem hiding this comment.
OpenMP no but the individual openmp runtime library might have a version introspection API.
There was a problem hiding this comment.
None of them does. Retrieving the version is a real pain :)
What you have to do is use the _OPENMP preprocessor macro in a program to get it's value. This value is a date, not a version number. Finally you have to use a map date/version to match the version from the date...
ogrisel
left a comment
There was a problem hiding this comment.
As discussed IRL, I agree we should further get rid of the wrapper class itself and just use a bunch of stateless functions to get and set the limits + the context manager.
|
FYI here is a recently open PR for adding a similar get/set number of threads for numpy: numpy/numpy#13136 |
ogrisel
left a comment
There was a problem hiding this comment.
+1 for extraction as https://github.com/joblib/threadpoolctl and closing this PR :)
No description provided.