Rocm pixi env#175
Conversation
* Add initial pixi environment all tests pass, predictions seem to be correct corresponds to a modernized conda environment following best practices * Reorder dependencies for easier read * Add openfold3 as an editable dependency * Sync cuda-python pin between pypi package and the conda environment * Comments Comments Overcommenting issues * Add explicitly a conda yml version of the pixi environment * Improve some wordings * Update pixi lockfile * Vendoring pieces of deepspeed incomplete, we might not need the native sources from upstream commit df59f203f40c8a292dd019ae68c9e6c88f107026 * Swap ninja verification with pytorch's * Vendoring pieces of deepspeed incomplete, we might not need the native sources from upstream commit df59f203f40c8a292dd019ae68c9e6c88f107026 * Use vendored deepspeed evoformer builder Use vendored deepspeed in the attention primitives * Add symlink to vendored deepspeed as in upstream * Vendor also op_builder.__init__ from deepspeed * Import explicitly EvoformerAttnBuilder, avoiding broken introspection magic * Add a ignore mechanism for cutlass detection in vendored deepspeed * Apply cutlass detection workaround and remove all nvidia-cutlass tricks from pixi environment * Remove nvidia-cutlass from openfold-3 dependencies (fix later) * Remove pypi ninja dependency in pixi workspace * No need for cutlass hacks * Add pixi config to .gitattributes * Remove deepspeed hacks for good * Update pixi lockfile * Update pixi conda environment * Remove MKL from pypi dependencies, as it is unused * Remove aria2 from pypi dependencies, unused and not so much of a convenience * Update lockfile Update lockfile * Re-enable pure PyPI install * Disable hack when conda is active * More comments on cutlass python API deprecation and pytorch * Make pixi environments (CPU, CUDA12, CUDA13, for all major platforms) * Increase LMDB map size to make test pass in osx-arm64 * Better comments of TODOs in pixi.toml Better comments of TODOs in pixi.toml Better comments of TODOs in pixi.toml * Pin cuequivariance until test failure is investigated * Move deepspeed to optional dependency also in pyproject * Pyproject: extend python version support * Pyproject: move dependencies table together with optional-dependencies * Pyproject: document future decision on dependency-groups * Pyproject: reformat to consolidate indent to 4 spaces * Pyproject: reorder dependencies for easier read * Pixi: add scipy * Pixi: add comment on CUDA13 * Pixi: make cuequivariance CUDA generic for its conda packages * Pixi: add reminder about devel install * Pyproject: fix and improve readability, add URLs * pixi.toml: make more readable by showing first envs, then base, then variants * pixi.toml: pin deepspeed to 0.18.3, first one with ninja detection fixed * pixi.toml: fully enable aarch64 and cuda13, revamp docs * pixi.lock: update * pixi.toml: add triton to cuequivariance dependencies for CUDA13 * pixi.lock: update * pixi.toml: include pip to allow users to play * pixi.toml: formatting for better readability * pixi.toml: restrict cuequivariance-cu13 to linux-64 until we unpin to >=0.8 * pixi.toml: formatting for better readability * pixi.toml: make pytorch-gpu an isolated environment feature in this way we can more easily express when a package is not ready yet in CF * pixi.toml: add environments that combine mostly pypi-based deps with CUDA from conda * pixi.toml: add openfold3-editable-full and account for lack of cuequivariance for python=3.14 * pixi.toml: brief documentation of the pypi-dominant environments * pixi.toml: add also the dev optional dependency group to openfold3-full * pyproject.toml: pin cuequivariance to <0.8 until we adapt tests * pixi.toml: add kalign to required non-pypi dependencies * pixi.toml: add more bioinformatics tools to non-pypi * pixi.toml: make env setup be part of the deepspeed-build feature * pixi.toml: simplify management of pypi features * pixi.lock: update, all tests pass A100,B300 x CUDA12,CUDA13 * pixi.toml: add table of what works and what needs test * pixi.toml: add tasks for exporting to regular conda environment yamls * conda environments: delete outdated modernized conda env, use new tasks instead * pixi.toml: bump min pixi version * pixi.toml: remove unnecessary comments * pixi.toml: remove unnecessary envvar definition for isolating extension builds * pixi.toml: better definition of maintenance environment pixi.toml: better definition of maintenance environment pixi.toml: better definition of maintenance environment * pixi.toml: add simple task to run test and save rsults to an environment-specific dir * of3: enable pickling regardless of forking strategy and platform * of3: enable multiple data loader workers in osx mps backed * Vendor improved deepspeed builder from upstream PR See: deepspeedai/DeepSpeed#7760 * pixi.lock: update * pixi.toml: remove some comment noise * of3: fix multiprocessing configuration corner case in osx * docker: move outdated example dockerfiles to docker/pixi-examples * examples: add example runner for osx inference * pixi.toml: ensure we get the right pytorch from pypi something smilar should actually be supported in pyproject.toml * pixi.lock: update, fixed torch cuda missmatch in pypi environments * pixi.toml: fix lock export + make default environment be maintenance * pixi.toml: use a more consitent name for environment arg * pixi.lock: update * pixi.toml: workaround for no-default-feature breaking the test task (pixi bug) * pixi.toml: issue with pixi pypi resolution seems solved * Revert "pixi.toml: issue with pixi pypi resolution seems solved" This reverts commit ded3482. * pixi.toml: better document problem and workaround * pixi.toml: make the test task present in all relevant environments this I feel makes less surprising its use, as opposed to passing the environment as an arg to a dependent task * pixi.toml: let CUDA13 flow freely * pixi.lock: update for initial pytorch 2.10, cuda 13.1 support * pixi.toml: add safe cuda environments (no accelerators) * of3: remove deepspeed hacks note that there are still some in __init__.py * of3: unvendor deepspeed * pixi.toml: simplify deepspeed dependency after our changes made it to CF/pypi * pixi.toml: remove safe environments as we are not maintaining them * pixi.toml: enable pytorch-coda in cuda 13 env after 2.10 release * pyproject.toml: pin deepspeed to >0.18.5, improved evoformer compilation * Add awscrt to dependencies, missing from recent PR * pixi.toml: setup correctly path to PTXAS_BLACKWELL for triton >=3.6.0 * pixi.toml: add -safe environments, at the moment just without cuequivariance these are also conda-pure environments * pixi.lock: update after consolidation (no vendor, pytorch 2.10 + CF cuda13) * pixi.toml: update outdated comments * updates with GB10 tests (#2) * updates with GB10 tests * cleanup * harmonize * linting data_module.py * speculative changes * pixi.toml: remove safe environments * pixi.lock: update after removal of safe environments * Remove pixi docker examples, to rework * Comment-out workaround for hard to reproduce ABI mismatch problem * pixi.toml: bump pixi, improve conda export by including all env variables * pixi.toml: unpin biotite * pixi.toml: python has its own feature * pixi.toml: bump deepspeed * pyproject.toml: bump deepspeed to version without Evoformer build bug * pixi.toml: detail on workaround * pixi.lock: update * pixi.toml: add example task to update safely the lockfile * pixi.toml: remove kalign2 * tests: fix test depending on unspecified glob return order * pixi.toml: better metadata * docs: wip * pixi.lock: update * Allow to configure multiprocessing start and set safe defaults We would still need to document this for users * Fix capitalization error * Fix capitalization error * Fix typo * pixi.lock: update --------- Co-authored-by: Tim Adler <tim.adler@bayer.com> Co-authored-by: Jan Domański <jan.domanski@omsf.io>
|
@Emrys-Merlin great contribution Tim :-) |
|
Getting some test failures with this env on AMD It could all be expected numerics, unclear. This is the chip update Looking in a more detailed way, the test_triangular_multiplicative_update.py update seems fine/minimal drift The other two (both for test_triangular_attention.py) look more severe: |
|
Thanks a lot for testing this @jandom! I really appreciate it :-) I think I count it as a win that the tests ran at all :-D I agree that some of the numerical differences warrant deeper inspection. I'm open to support here, but I am a bit handicapped without access to AMD GPUs. If it is easy for you to share limited access with me to debug this, that could speed up things a bit. I will continue looking for an internal solution. I will be on vacation next week. So, I won't be very responsive. If we don't find a solution until Barcelona, I'm happy to chat there :-) |
|
No worries, I've shared this ticket with Gagan already – he might be able to come in and help |
…icative update Floating point arithmetic is not associative: different hardware parallelizes reductions (e.g. matrix multiplications, attention softmax) in different orders, accumulating rounding errors differently. CUDA and ROCm therefore produce results that diverge by up to ~2e-6 even on identical inputs. Snapshot comparisons are now routed to nvidia/ or rocm/ subdirectories based on torch.version.hip, so each platform validates consistency with itself across code changes.
|
Hi @Emrys-Merlin, @jandom, @sdvillal, thank you for adding ROCm support. I tested this on AMD hardware. The environment resolves and installs correctly. I ran into snapshot regression failures in |
982b1e3 to
9a5f76c
Compare
9a5f76c to
433f266
Compare
|
Hi @singagan, Thanks a lot for your help! Your changes make sense to me, so I added them to this PR. From my side this PR is now ready for testing @jandom. After everything is done, I think we should squash the commits in this PR. As I started working on this feature before the pixi PR was merged, there are a couple of commits in the history that were squashed when the pixi PR was merged. |
|
Kicked-off all the tests and let's merge as soon as green |
|
@Emrys-Merlin @singagan thanks for contributing here – updated snapshots make me happy! |
|
Could we quickly add some documentation for installing with AMD? I think we could add it to this file (and rename the title accordingly) https://github.com/aqlaboratory/openfold-3/blob/main/docs/source/kernels.md |
|
Don't we have the install instructions already here? https://github.com/aqlaboratory/openfold-3/blob/main/docs/source/Installation.md Why would we add those ROCm instructions to the kernels page? Sorry maybe I'm confused |
|
Oh if we have previous instructions on the installation.md that's great, but looking at this version, it doesn't seem like it includes the new |
|
I added a line to Installation.md and updated the pixi figure. |
|
Looks good to me! If there are any outstanding issues, let's do a follow-up PR |
Summary
This PR introduces a ROCm pixi-environment called
openfold3-rocm7in line with the cpu/cuda12/cuda13 environments. This unifies the usage pattern of openfold3 after the migration to the pixi package manger.Changes
Related Issues
I tried to build the environment on our HPC cluster, but our proxy interfered with the resolution of the pytorch dependency. @sdvillal thankfully already opened an issue about that with the pixi developers, so hopefully this will be resolved soon. I spun up an AWS EC2 instance where the resolution worked without any issues.
Testing
I could only test that the environment resolves as I do not have access to an AMD accelerator. @singagan if you could help me out here, that would be highly appreciated :-)
The current output of the
validate-openfold3-rocmcommand is as follows:$ pixi run -e openfold3-rocm7 validate-openfold3-rocm OpenFold3 ROCm environment check [PASS] PyTorch installed: 2.11.0+rocm7.2 [PASS] PyTorch built with ROCm (HIP): 7.2.26015 [FAIL] ROCm GPU visible: none [PASS] Triton installed: 3.6.0 [FAIL] Triton backend is HIP: 0 active drivers ([]). There should only be one. [PASS] Triton evoformer kernel loaded One or more checks failed. See above for details. Installation instructions: https://github.com/aqlaboratory/openfold-3/blob/main/docs/source/Installation.mdOther Notes
Note, as we need to pull pytorch from PyPI, we pull almost all dependencies from PyPI and not from conda-forge. This is necessary, because if any one of our dependencies were to pull pytorch from conda-forge, this would supersede our PyPI pytorch request and we would end up with a pytorch version without ROCm support. This is a known pixi limitation. If it gets resolved, we could think about pulling more of the dependencies from conda-forge, but this is optional and not a blocker.
@sdvillal, I would love to get your feedback. The environment setup is rather complex and I'm not completely convinced I assembled the rocm environment correctly (or if I pulled in unnecessary features).
@jnwei @jandom As discussed in #166, this is the draft to enable ROCm in the pixi setup.