-
Notifications
You must be signed in to change notification settings - Fork 97
RFC-0036-hardware-accelerators-pytorch.org #63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,89 @@ | ||
| # Hardware accelerators at pytorch.org | ||
|
|
||
| **Authors:** | ||
| * @bsochack | ||
| * @sujoysaraswati | ||
| * @fzhao3 | ||
| * @jgong5 | ||
|
|
||
|
|
||
| ## **Summary** | ||
| The main page of pytorch.org only shows PyTorch Get-Started on CUDA, ROCm and CPU, but the hardware accelerator support is more diverse and the ecosystem is rapidly evolving with new accelerators like Intel Gaudi, Google TPU, etc. | ||
| The proposal in this RFC is to enable the visibility of broad set of accelerators that supports PyTorch for the community, and create a space for hardware vendors to provide instructions and details about their PyTorch support. | ||
|
|
||
|
|
||
| ## **Motivation** | ||
| There are 2 primary motivations behind this RFC: | ||
| 1. Evangelize PyTorch as an open source framework that supports a wide variety of hardware. | ||
| * PyTorch is supported on more compute platforms than CUDA, ROCm and CPU . | ||
| * PyTorch can maintain and enhance its competitive position in the machine learning frameworks by enabling the visibility of rich set of hardware accelerators on which PyTorch can run. | ||
| 2. Provide a consolidated entry point where to look for more information about supported hardware accelerators. | ||
| * pytorch.org is a starting point for many users. The page can help users to find more details. | ||
|
|
||
|
|
||
|
|
||
| ## **Proposed Implementation** | ||
| The proposal is to enhance pytorch.org by adding 3 new sections: | ||
| * Add new compute platforms to “Install PyTorch” section at the main page of pytorch.org. | ||
| * Add new compute platforms to https://pytorch.org/get-started/. | ||
| * Add new “Compute Platforms” section at the main page of pytorch.org. | ||
|
|
||
| ### PyTorch integration types | ||
| There are at least 2 ways how compute platforms are integrated with PyTorch: | ||
| 1. In-tree – CPU, CUDA, ROCm are developed, built and tested in PyTorch environment. PyTorch is ensuring quality criteria. This approach is limited to only a few compute platforms and it does not scale with number of compute platforms. | ||
| 2. Out-of-tree – Integration of other compute platforms like Intel Gaudi is done via additional python package (extension) that needs to be installed on top of PyTorch CPU package. Development, built and testing is done outside of PyTorch. In this case: | ||
| * PyTorch is ensuring quality of PyTorch CPU package. | ||
| * Hardware provider is ensuring quality of its extension against PyTorch CPU package. | ||
|
|
||
| ### Stable vs. nightly builds | ||
| 1. Stable builds – the hardware provider shall provide the installation commands when a compute platform is tested against given PyTorch version and it meets the quality criteria set by PyTorch Foundation. | ||
| 2. Preview (nightly) - Similar to the stable builds, but the hardware provider must implement method to provide quickly fixes for PyTorch nightly. It should be optional. | ||
|
|
||
| ### Quality criteria | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we want to go a bit deeper into the details of what are going to be the criterions and expectations here. I'll take a stab at that proposal and come back to you here.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @albanD waiting for the proposal from your side.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My high level thoughts about quality criteria:
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @albanD any update?
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @albanD are there any details that can be shared with us?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hey! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This thread seems to have gone stale but IMO this is a crucial part of this proposal: the tests need to be refactored to be gated on features sets rather than specific backends or technology. (i.e no You could then use the features enabled for a given backend as part of your reporting: "This platform is stable and supports torch.compile with a Triton backend for the following dtypes. Note: non-contiguous tensors are not supported", etc. |
||
| PyTorch foundation shall introduce minimal requirements for new compute platforms. | ||
| Report what kind of testing was done on a compute platform with PyTorch build | ||
| * A common test report format - to be defined. | ||
| * A test report to contain: results of tests (PyTorch UT, TorchBench, model runs, additional test suites), tested OSes. | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Due to variations in capabilities and implementation among different hardware accelerators, how should the minimum testing suite be defined?
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a challenge. One option is to verify basic infrastructure and ops, i.e. tensor management, dispatcher, copy ops, basic math, hello world, etc. |
||
| * Test requirements: | ||
| * Define must pass tests from PyTorch perspective i.e. subset of PyTorch framework UT tests. | ||
| * Leave other tests as optional so hardware providers can decide what kind of tests are the most relevant for its compute platform. | ||
|
|
||
|
|
||
| ### Extend the “Install PyTorch” section on pytorch.org with new compute platforms | ||
| Currently, the “Install PyTorch” section only includes information about CUDA, ROCm, and CPU. It gives the impression that these are the only compute platforms users can work with. | ||
|
|
||
|  | ||
|
|
||
| The proposal is to add buttons for new compute platforms: | ||
|
|
||
|  | ||
|
|
||
| #### Editing PyTorch installation instructions | ||
| The hardware providers should have a way to edit the installation commands via GitHub PRs. | ||
|
|
||
|
|
||
| ### Extend pytorch.org/get-started with new compute platforms | ||
| Similarly to adding new compute platforms at the main page of pytorch.org, the get started page should be extended with other compute platforms. Similar description to CUDA and ROCm should be added for other compute platforms. | ||
|
|
||
| ### Add a new “Compute Platforms” section on pytorch.org | ||
| pytorch.org is a starting point for many users where they can find initial information about all supported compute platforms, and select the most optimal solution for its needs from perspective of purpose, speed and power consumption. | ||
| The proposal is to add a new section on the main pytorch.org page showing supported hardware accelerators. | ||
|
|
||
|  | ||
|
|
||
| When clicked at the hardware accelerator, it should take a user to a Pytorch subpage with more details: | ||
| * How to prepare machine before installation of PyTorch i.e. where to look for dockers, how to obtain drivers, etc. | ||
| * How to install PyTorch. | ||
| * How to get started to work with a compute platform in Python i.e. importing pytorch, importing additional compute platform specific packages. | ||
| * Where to get help in case of any issues. | ||
|
|
||
| ## Open points | ||
| ### Device plugins maintained out of the PyTorch tree | ||
| Most devices are maintained out of the PyTorch tree, and some of them require an additional PyTorch device plugin to be installed to initialize the device, register operators, and provider additional optimizations. | ||
| Can the PyTorch installation instruction provide the additional package to be installed? The package will be hosted elsewhere. | ||
|
|
||
|  | ||
|
|
||
| ### Access to older stable PyTorch builds | ||
| Currently only the latest stable PyTorch builds are present in the PyTorch install table. Can it be extended with the older builds (N-1) ? | ||
| Rationale: The compute platforms integrated out-of-tree may need more time to provide its support for a given PyTorch version as regressions do not block PyTorch CI and release. | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does "out of tree" hardware accelerator refer to an accelerator that implements logic outside of PyTorch upstream, including devices accessed via privateuse1?
If so, could we specify "out-of-tree (including privateuse keys)" here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, "out of tree" refers to "integration logic" that is implemented outside of PyTorch upstream. It is applicable to privateuse1 and other accelerators with device specific dispatch key (i.e. Gaudi uses HPU key)