Rocm pixi env by Emrys-Merlin · Pull Request #175 · aqlaboratory/openfold-3

Emrys-Merlin · 2026-04-10T09:00:28Z

Summary

This PR introduces a ROCm pixi-environment called openfold3-rocm7 in line with the cpu/cuda12/cuda13 environments. This unifies the usage pattern of openfold3 after the migration to the pixi package manger.

Changes

Added a pytorch-rocm pixi-feature, which pulls pytorch and triton with rocm support from the pytorch PyPI mirror. (Please, note that we cannot pull pytorch-rocm dependencies from conda-forge (yet).)

Related Issues

I tried to build the environment on our HPC cluster, but our proxy interfered with the resolution of the pytorch dependency. @sdvillal thankfully already opened an issue about that with the pixi developers, so hopefully this will be resolved soon. I spun up an AWS EC2 instance where the resolution worked without any issues.

Testing

I could only test that the environment resolves as I do not have access to an AMD accelerator. @singagan if you could help me out here, that would be highly appreciated :-)

The current output of the validate-openfold3-rocm command is as follows:

$ pixi run -e openfold3-rocm7 validate-openfold3-rocm
OpenFold3 ROCm environment check

  [PASS] PyTorch installed: 2.11.0+rocm7.2
  [PASS] PyTorch built with ROCm (HIP): 7.2.26015
  [FAIL] ROCm GPU visible: none
  [PASS] Triton installed: 3.6.0
  [FAIL] Triton backend is HIP: 0 active drivers ([]). There should only be one.
  [PASS] Triton evoformer kernel loaded

One or more checks failed. See above for details.
Installation instructions: https://github.com/aqlaboratory/openfold-3/blob/main/docs/source/Installation.md

Other Notes
Note, as we need to pull pytorch from PyPI, we pull almost all dependencies from PyPI and not from conda-forge. This is necessary, because if any one of our dependencies were to pull pytorch from conda-forge, this would supersede our PyPI pytorch request and we would end up with a pytorch version without ROCm support. This is a known pixi limitation. If it gets resolved, we could think about pulling more of the dependencies from conda-forge, but this is optional and not a blocker.

@sdvillal, I would love to get your feedback. The environment setup is rather complex and I'm not completely convinced I assembled the rocm environment correctly (or if I pulled in unnecessary features).

@jnwei @jandom As discussed in #166, this is the draft to enable ROCm in the pixi setup.

* Add initial pixi environment all tests pass, predictions seem to be correct corresponds to a modernized conda environment following best practices * Reorder dependencies for easier read * Add openfold3 as an editable dependency * Sync cuda-python pin between pypi package and the conda environment * Comments Comments Overcommenting issues * Add explicitly a conda yml version of the pixi environment * Improve some wordings * Update pixi lockfile * Vendoring pieces of deepspeed incomplete, we might not need the native sources from upstream commit df59f203f40c8a292dd019ae68c9e6c88f107026 * Swap ninja verification with pytorch's * Vendoring pieces of deepspeed incomplete, we might not need the native sources from upstream commit df59f203f40c8a292dd019ae68c9e6c88f107026 * Use vendored deepspeed evoformer builder Use vendored deepspeed in the attention primitives * Add symlink to vendored deepspeed as in upstream * Vendor also op_builder.__init__ from deepspeed * Import explicitly EvoformerAttnBuilder, avoiding broken introspection magic * Add a ignore mechanism for cutlass detection in vendored deepspeed * Apply cutlass detection workaround and remove all nvidia-cutlass tricks from pixi environment * Remove nvidia-cutlass from openfold-3 dependencies (fix later) * Remove pypi ninja dependency in pixi workspace * No need for cutlass hacks * Add pixi config to .gitattributes * Remove deepspeed hacks for good * Update pixi lockfile * Update pixi conda environment * Remove MKL from pypi dependencies, as it is unused * Remove aria2 from pypi dependencies, unused and not so much of a convenience * Update lockfile Update lockfile * Re-enable pure PyPI install * Disable hack when conda is active * More comments on cutlass python API deprecation and pytorch * Make pixi environments (CPU, CUDA12, CUDA13, for all major platforms) * Increase LMDB map size to make test pass in osx-arm64 * Better comments of TODOs in pixi.toml Better comments of TODOs in pixi.toml Better comments of TODOs in pixi.toml * Pin cuequivariance until test failure is investigated * Move deepspeed to optional dependency also in pyproject * Pyproject: extend python version support * Pyproject: move dependencies table together with optional-dependencies * Pyproject: document future decision on dependency-groups * Pyproject: reformat to consolidate indent to 4 spaces * Pyproject: reorder dependencies for easier read * Pixi: add scipy * Pixi: add comment on CUDA13 * Pixi: make cuequivariance CUDA generic for its conda packages * Pixi: add reminder about devel install * Pyproject: fix and improve readability, add URLs * pixi.toml: make more readable by showing first envs, then base, then variants * pixi.toml: pin deepspeed to 0.18.3, first one with ninja detection fixed * pixi.toml: fully enable aarch64 and cuda13, revamp docs * pixi.lock: update * pixi.toml: add triton to cuequivariance dependencies for CUDA13 * pixi.lock: update * pixi.toml: include pip to allow users to play * pixi.toml: formatting for better readability * pixi.toml: restrict cuequivariance-cu13 to linux-64 until we unpin to >=0.8 * pixi.toml: formatting for better readability * pixi.toml: make pytorch-gpu an isolated environment feature in this way we can more easily express when a package is not ready yet in CF * pixi.toml: add environments that combine mostly pypi-based deps with CUDA from conda * pixi.toml: add openfold3-editable-full and account for lack of cuequivariance for python=3.14 * pixi.toml: brief documentation of the pypi-dominant environments * pixi.toml: add also the dev optional dependency group to openfold3-full * pyproject.toml: pin cuequivariance to <0.8 until we adapt tests * pixi.toml: add kalign to required non-pypi dependencies * pixi.toml: add more bioinformatics tools to non-pypi * pixi.toml: make env setup be part of the deepspeed-build feature * pixi.toml: simplify management of pypi features * pixi.lock: update, all tests pass A100,B300 x CUDA12,CUDA13 * pixi.toml: add table of what works and what needs test * pixi.toml: add tasks for exporting to regular conda environment yamls * conda environments: delete outdated modernized conda env, use new tasks instead * pixi.toml: bump min pixi version * pixi.toml: remove unnecessary comments * pixi.toml: remove unnecessary envvar definition for isolating extension builds * pixi.toml: better definition of maintenance environment pixi.toml: better definition of maintenance environment pixi.toml: better definition of maintenance environment * pixi.toml: add simple task to run test and save rsults to an environment-specific dir * of3: enable pickling regardless of forking strategy and platform * of3: enable multiple data loader workers in osx mps backed * Vendor improved deepspeed builder from upstream PR See: deepspeedai/DeepSpeed#7760 * pixi.lock: update * pixi.toml: remove some comment noise * of3: fix multiprocessing configuration corner case in osx * docker: move outdated example dockerfiles to docker/pixi-examples * examples: add example runner for osx inference * pixi.toml: ensure we get the right pytorch from pypi something smilar should actually be supported in pyproject.toml * pixi.lock: update, fixed torch cuda missmatch in pypi environments * pixi.toml: fix lock export + make default environment be maintenance * pixi.toml: use a more consitent name for environment arg * pixi.lock: update * pixi.toml: workaround for no-default-feature breaking the test task (pixi bug) * pixi.toml: issue with pixi pypi resolution seems solved * Revert "pixi.toml: issue with pixi pypi resolution seems solved" This reverts commit ded3482. * pixi.toml: better document problem and workaround * pixi.toml: make the test task present in all relevant environments this I feel makes less surprising its use, as opposed to passing the environment as an arg to a dependent task * pixi.toml: let CUDA13 flow freely * pixi.lock: update for initial pytorch 2.10, cuda 13.1 support * pixi.toml: add safe cuda environments (no accelerators) * of3: remove deepspeed hacks note that there are still some in __init__.py * of3: unvendor deepspeed * pixi.toml: simplify deepspeed dependency after our changes made it to CF/pypi * pixi.toml: remove safe environments as we are not maintaining them * pixi.toml: enable pytorch-coda in cuda 13 env after 2.10 release * pyproject.toml: pin deepspeed to >0.18.5, improved evoformer compilation * Add awscrt to dependencies, missing from recent PR * pixi.toml: setup correctly path to PTXAS_BLACKWELL for triton >=3.6.0 * pixi.toml: add -safe environments, at the moment just without cuequivariance these are also conda-pure environments * pixi.lock: update after consolidation (no vendor, pytorch 2.10 + CF cuda13) * pixi.toml: update outdated comments * updates with GB10 tests (#2) * updates with GB10 tests * cleanup * harmonize * linting data_module.py * speculative changes * pixi.toml: remove safe environments * pixi.lock: update after removal of safe environments * Remove pixi docker examples, to rework * Comment-out workaround for hard to reproduce ABI mismatch problem * pixi.toml: bump pixi, improve conda export by including all env variables * pixi.toml: unpin biotite * pixi.toml: python has its own feature * pixi.toml: bump deepspeed * pyproject.toml: bump deepspeed to version without Evoformer build bug * pixi.toml: detail on workaround * pixi.lock: update * pixi.toml: add example task to update safely the lockfile * pixi.toml: remove kalign2 * tests: fix test depending on unspecified glob return order * pixi.toml: better metadata * docs: wip * pixi.lock: update * Allow to configure multiprocessing start and set safe defaults We would still need to document this for users * Fix capitalization error * Fix capitalization error * Fix typo * pixi.lock: update --------- Co-authored-by: Tim Adler <tim.adler@bayer.com> Co-authored-by: Jan Domański <jan.domanski@omsf.io>

jandom · 2026-04-14T16:17:10Z

@Emrys-Merlin great contribution Tim :-)

jandom · 2026-04-16T13:51:10Z

Getting some test failures with this env on AMD

FAILED openfold3/tests/test_triangular_attention.py::test_shape[cuda-True] - AssertionError: Values are not sufficiently close.
FAILED openfold3/tests/test_triangular_attention.py::test_shape[cuda-False] - AssertionError: Values are not sufficiently close.
FAILED openfold3/tests/test_triangular_multiplicative_update.py::test_shape[cuda] - AssertionError: Values are not sufficiently close.

It could all be expected numerics, unclear. This is the chip

(openfold3:openfold3-rocm7) [jandom@k006-004-v2 openfold-3]$ amd-smi 
+------------------------------------------------------------------------------+
| AMD-SMI 26.2.1+fc0010cf6a    amdgpu version: 6.16.13  ROCm version: 7.2.0    |
| VBIOS version: 613661                                                        |
| Platform: Linux Guest (Passthrough)                                          |
|-------------------------------------+----------------------------------------|
| BDF                        GPU-Name | Mem-Uti   Temp   UEC       Power-Usage |
| GPU  HIP-ID  OAM-ID  Partition-Mode | GFX-Uti    Fan               Mem-Usage |
|=====================================+========================================|
| 0000:0c:00.0     AMD Instinct MI210 | 0 %      51 °C   0            43/300 W |
|   0       0     N/A             N/A | 0 %        N/A             10/65520 MB |
+-------------------------------------+----------------------------------------+
+------------------------------------------------------------------------------+
| Processes:                                                                   |
|  GPU        PID  Process Name          GTT_MEM  VRAM_MEM  MEM_USAGE     CU % |
|==============================================================================|
|  No running processes found                                                  |
+------------------------------------------------------------------------------+

update

Looking in a more detailed way, the test_triangular_multiplicative_update.py update seems fine/minimal drift

E       
E       output:
E         Shape: (2, 22, 22, 128)
E         Number of differences: 27 / 123904 (0.0%)
E         Statistics are computed for differing elements only.
E         Stats for abs(obtained - expected):
E           Max:     1.8238788470625877e-06
E           Mean:    1.1477516181912506e-06
E           Median:  1.0848743841052055e-06
E         Stats for abs(obtained - expected) / abs(expected):
E           Max:     0.003565334714949131
E           Mean:    0.0020918985828757286
E           Median:  0.001880077994428575
E         Individual errors:

The other two (both for test_triangular_attention.py) look more severe:

       output:
E         Shape: (2, 22, 22, 128)
E         Number of differences: 57574 / 123904 (46.5%)
E         Statistics are computed for differing elements only.
E         Stats for abs(obtained - expected):
E           Max:     6.344435678329319e-05
E           Mean:    9.54591541812988e-06
E           Median:  8.05101626610849e-06
E         Stats for abs(obtained - expected) / abs(expected):
E           Max:     2778.166259765625
E           Mean:    1.3626092672348022
E           Median:  0.3236933946609497

E       output:
E         Shape: (2, 22, 22, 128)
E         Number of differences: 47210 / 123904 (38.1%)
E         Statistics are computed for differing elements only.
E         Stats for abs(obtained - expected):
E           Max:     5.022007826482877e-05
E           Mean:    1.0010324331233278e-05
E           Median:  8.413369869231246e-06
E         Stats for abs(obtained - expected) / abs(expected):
E           Max:     57885.04296875
E           Mean:    3.7973642349243164
E           Median:  0.34220370650291443

Emrys-Merlin · 2026-04-17T15:22:06Z

Thanks a lot for testing this @jandom! I really appreciate it :-)

I think I count it as a win that the tests ran at all :-D

I agree that some of the numerical differences warrant deeper inspection. I'm open to support here, but I am a bit handicapped without access to AMD GPUs. If it is easy for you to share limited access with me to debug this, that could speed up things a bit. I will continue looking for an internal solution.

I will be on vacation next week. So, I won't be very responsive. If we don't find a solution until Barcelona, I'm happy to chat there :-)

jandom · 2026-04-20T10:51:17Z

No worries, I've shared this ticket with Gagan already – he might be able to come in and help

…icative update Floating point arithmetic is not associative: different hardware parallelizes reductions (e.g. matrix multiplications, attention softmax) in different orders, accumulating rounding errors differently. CUDA and ROCm therefore produce results that diverge by up to ~2e-6 even on identical inputs. Snapshot comparisons are now routed to nvidia/ or rocm/ subdirectories based on torch.version.hip, so each platform validates consistency with itself across code changes.

singagan · 2026-04-27T11:23:40Z

Hi @Emrys-Merlin, @jandom, @sdvillal, thank you for adding ROCm support. I tested this on AMD hardware. The environment resolves and installs correctly. I ran into snapshot regression failures in test_triangular_attention and test_triangular_multiplicative_update as these were generated on NVIDIA (added in 9879cd8e, stored under openfold3/tests/test_data/snapshots/) and since floating point arithmetic is not associative, we observe numerical differences. I added per-platform snapshot support (nvidia/ and rocm/subdirectories) with ROCm-generated snapshots, all tests now pass. Branch with the changes: https://github.com/singagan/openfold-3/tree/rocm-pixi-env. Please feel free to pull it in if it looks good to you or I can make a separate PR if it works better.

Emrys-Merlin · 2026-04-28T09:39:17Z

Hi @singagan,

Thanks a lot for your help! Your changes make sense to me, so I added them to this PR.

From my side this PR is now ready for testing @jandom.

After everything is done, I think we should squash the commits in this PR. As I started working on this feature before the pixi PR was merged, there are a couple of commits in the history that were squashed when the pixi PR was merged.

jandom · 2026-04-28T12:24:38Z

Kicked-off all the tests and let's merge as soon as green

jandom · 2026-04-28T12:28:00Z

@Emrys-Merlin @singagan thanks for contributing here – updated snapshots make me happy!

jnwei · 2026-04-28T13:03:35Z

Could we quickly add some documentation for installing with AMD? I think we could add it to this file (and rename the title accordingly)

https://github.com/aqlaboratory/openfold-3/blob/main/docs/source/kernels.md

jandom · 2026-04-28T14:30:20Z

Don't we have the install instructions already here?

https://github.com/aqlaboratory/openfold-3/blob/main/docs/source/Installation.md

Why would we add those ROCm instructions to the kernels page? Sorry maybe I'm confused

jnwei · 2026-04-28T14:40:22Z

Oh if we have previous instructions on the installation.md that's great, but looking at this version, it doesn't seem like it includes the new openfold3-rocm7 environment

Emrys-Merlin · 2026-04-28T14:43:50Z

I added a line to Installation.md and updated the pixi figure.

jandom · 2026-04-28T14:54:35Z

Looks good to me! If there are any outstanding issues, let's do a follow-up PR

sdvillal and others added 25 commits March 23, 2026 12:33

fix linter problems

ce70ebb

add pre-commit

9b1749c

Merge branch 'main' into pixi-beta

6fbef74

Merge branch 'public-main' into pixi-beta

57736cd

Merge branch 'public-main' into pixi-beta

784502b

add pixi.excalidraw to docs

f696d6c

Merge branch 'public-main' into pixi-beta

4ce0467

remove blackwell build instructions (obsolete)

c101a86

update docs to recommend pixi

cf4bfb6

better docs on pixi

4e36782

update pixi.lock

48e06e3

docker build and tests for pixi

8a4a26b

Merge branch 'main' into pixi-beta

3f9ed35

set a sensible 2mb default

888f070

more context manager plus dirty dataclass

39ddce9

unit tests

07d3454

Merge branch 'public-main' into pixi-beta

5e42337

more linting

8a745e9

missed a dep: regenerate pixi.lock

ba24b95

Merge branch 'main' into pixi-beta

68e743e

remove duplicate projects

feeaacd

First draft for rocm env

de7e331

First working install

0d6d0fc

Remove pytorch-lighting dep in pixi.toml

89815f8

jandom deleted the branch aqlaboratory:main April 23, 2026 07:27

jandom closed this Apr 23, 2026

jandom reopened this Apr 24, 2026

Emrys-Merlin added 2 commits April 28, 2026 11:06

Merge branch 'main' into rocm-pixi-env

386cc0d

Regenerate pixi.lock

f4bd129

Emrys-Merlin changed the base branch from pixi-beta to main April 28, 2026 09:20

Emrys-Merlin marked this pull request as ready for review April 28, 2026 09:21

Emrys-Merlin marked this pull request as draft April 28, 2026 09:22

Emrys-Merlin force-pushed the rocm-pixi-env branch from 982b1e3 to 9a5f76c Compare April 28, 2026 09:33

Revert accidental formatting changes

433f266

Emrys-Merlin force-pushed the rocm-pixi-env branch from 9a5f76c to 433f266 Compare April 28, 2026 09:35

Emrys-Merlin marked this pull request as ready for review April 28, 2026 09:36

jandom added the safe-to-test Internal only label used to indicate PRs that are ready for automated CI testing. label Apr 28, 2026

Update docs

dda150d

jandom merged commit 7b8068f into aqlaboratory:main Apr 28, 2026
2 checks passed

jandom deleted the rocm-pixi-env branch April 28, 2026 14:55

Conversation

Emrys-Merlin commented Apr 10, 2026

Uh oh!

jandom commented Apr 14, 2026

Uh oh!

jandom commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Emrys-Merlin commented Apr 17, 2026

Uh oh!

jandom commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

singagan commented Apr 27, 2026

Uh oh!

Emrys-Merlin commented Apr 28, 2026

Uh oh!

jandom commented Apr 28, 2026

Uh oh!

jandom commented Apr 28, 2026

Uh oh!

jnwei commented Apr 28, 2026

Uh oh!

jandom commented Apr 28, 2026

Uh oh!

jnwei commented Apr 28, 2026

Uh oh!

Emrys-Merlin commented Apr 28, 2026

Uh oh!

jandom commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jandom commented Apr 16, 2026 •

edited

Loading

jandom commented Apr 20, 2026 •

edited

Loading

jandom commented Apr 28, 2026 •

edited

Loading