Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
134 commits
Select commit Hold shift + click to select a range
877674b
Add PyTorch NPU full test CI workflow
Apr 28, 2026
1a2f87a
Update workflow triggers and names
Apr 28, 2026
e691ad9
Use GitHub ARM runner for Docker image build
Apr 28, 2026
4c08851
Fix Docker image tag lowercase requirement
Apr 28, 2026
659aa6a
Use PyPA manylinux_2_28_aarch64 base image
Apr 28, 2026
14b76c4
Align Dockerfile with PyTorch official manywheel image
Apr 28, 2026
3908f29
Move Python PATH setup before CANN installation
Apr 28, 2026
204ab49
Source CANN environment before nnal installation
Apr 28, 2026
9445b33
Fix Docker image URL lowercase in _build.yml
Apr 28, 2026
8b13494
Add pull_request trigger to npu-full-test workflow
Apr 28, 2026
2a63377
Remove env.REGISTRY from reusable workflow _build.yml
Apr 28, 2026
50e5dd4
Add container credentials for private ghcr.io image pull
Apr 28, 2026
32cf137
Remove pull_request trigger from build-docker-image workflow
Apr 28, 2026
fceb4c6
Add step to set package visibility to public
Apr 28, 2026
9e50cdd
Restore pull_request trigger for build-docker-image workflow
Apr 28, 2026
58555da
Remove container credentials from workflows
Apr 28, 2026
c41ed0d
Remove pull_request trigger from npu-full-test workflow
Apr 28, 2026
958f4ba
Enable push and public visibility for PR events (test phase)
Apr 28, 2026
9fcb9f6
Change container registry from ghcr.io to quay.io
Apr 28, 2026
98194c2
Simplify workflow trigger to push on dev_master only
Apr 28, 2026
a6fb9e2
Change image name from pytorch-npu-builder to pytorch
Apr 28, 2026
ed1c8f5
Add environment: quay to access environment secrets
Apr 28, 2026
8f81fb7
Set environment to QUAY_USERNAME to access secrets
Apr 28, 2026
9bf616e
Change workflow trigger to pull_request and use quay.io registry
Apr 28, 2026
bd329c8
Add workflow to verify pulling quay.io docker image
Apr 28, 2026
ab578e2
Add push trigger to verify-docker-image workflow
Apr 28, 2026
3abe5a8
Use GitHub free ubuntu-22.04-arm runner for verification
Apr 28, 2026
bde1eb1
Fix grep failure when no torch/pytest packages installed
Apr 28, 2026
4c27b00
Rename workflow and add test dependencies installation
Apr 28, 2026
c5beeb7
Use requirements-ci.txt for test dependencies (follow upstream PyTorch)
Apr 28, 2026
f6b47c5
Use PyTorch requirements-build.txt and add distributed build support
Apr 29, 2026
dd4e72f
Restore original _build.yml and create new _build_torch_npu.yml with …
Apr 29, 2026
3c18e4f
Fix cache key ordering and add proper restore-keys for fallback
Apr 29, 2026
e9e6d98
Fix cache save condition and optimize tar packaging
Apr 29, 2026
a84335f
Add cache verification steps for debugging and monitoring
Apr 29, 2026
50a63c0
Restore _build.yml to upstream Ascend/pytorch master version
Apr 29, 2026
c458973
Add git clone proxy support for faster repository access
Apr 29, 2026
754ae85
Fix npu-sync-test.yml to call _build_torch_npu.yml instead of _build.yml
Apr 29, 2026
3c1480f
Remove wheel caching logic (wheel is rarely reused)
Apr 29, 2026
ab9272a
Simplify cache strategy: only pip cache and ccache
Apr 29, 2026
785bae0
Disable torchair build due to PyTorch API compatibility issue
Apr 29, 2026
9070d50
Remove pip cache, keep only ccache (fix numpy not found error)
Apr 29, 2026
76a0899
Fix ccache configuration: use symbolic links instead of CC environmen…
Apr 29, 2026
64ecb13
Increase build parallelism: MAX_JOBS=80, ccache max_size=20G
Apr 29, 2026
a83b36f
Increase build parallelism: MAX_JOBS=128
Apr 29, 2026
6554007
Add pip cache to accelerate dependency download
Apr 29, 2026
3086861
Add pip cache to _test.yml for test dependencies
Apr 29, 2026
c810366
Always save cache regardless of build result
Apr 29, 2026
7bfbf93
Add PyPI cache URL to accelerate pip downloads
Apr 29, 2026
891621b
Fix Jinja2 compatibility issue and add pip cache to _collect.yml
Apr 29, 2026
7d69c09
Optimize test source package: only package test directory
Apr 29, 2026
931120f
Add verbose logging to collect_all_cases.py
Apr 29, 2026
6f572b8
Add test-collect.yml workflow and remove PR trigger from npu-sync-tes…
Apr 29, 2026
2be0d11
Clone PyTorch test source directly instead of downloading artifact
Apr 29, 2026
bb4c7f4
Print full error logs in collect_all_cases.py verbose mode
Apr 29, 2026
5836fc1
Add run_test.py execution step to test-collect.yml
Apr 29, 2026
f24a4af
Run run_test.py from test directory to avoid torch import conflict
Apr 29, 2026
616484e
Add NPU verification and environment setup to test-collect.yml
Apr 29, 2026
5fc4f67
Fix issues in test-collect.yml workflow
Apr 29, 2026
9e6b1a0
Fix YAML syntax error: replace heredoc with echo statements
Apr 29, 2026
bb7f2a9
Install upstream PyTorch test dependencies
Apr 29, 2026
254d0ab
Show CANN directory contents in Verify NPU step
Apr 29, 2026
3af5820
Move Install test dependencies step after Upgrade pip
Apr 29, 2026
2d2afc1
Fix torch import conflict: cd to /tmp before Verify NPU
Apr 29, 2026
542135f
Add newline at end of test-collect.yml
Apr 29, 2026
7bea533
Make Verify NPU step continue on error
Apr 29, 2026
3e1b983
Refactor Docker build: single CANN version with multi-Python support
May 6, 2026
81474d7
Fix Dockerfile path calculation in build script
May 6, 2026
96cbeff
Fix docker buildx command formatting issue
May 6, 2026
e5a5727
Fix CANN package URLs: use 9.0.0-beta.2 for stable version
May 6, 2026
0f54c7c
Add PR trigger for npu-sync-test workflow
May 6, 2026
9d0893d
Update default Docker image tag to manylinux-cann9.0.0-beta.2-20260428
May 6, 2026
fd440d8
Fix PyTorch 2.11+ API compatibility and optimize build workflow
May 6, 2026
7057c84
Merge upstream/master: sync latest changes and fix pointwise_strategy…
May 6, 2026
ba7ecf1
Fix _collect.yml: add Verify NPU step and improve dependency installa…
May 6, 2026
7a557fa
Split torch_npu build step into two separate steps
May 6, 2026
8ff7b58
Remove pull_request trigger from test-collect.yml
May 6, 2026
9ef8e06
Fix ccache configuration and improve cache hit rate
May 6, 2026
68e5ed9
Fix VariableFallbackKernel.cpp for PyTorch 2.11+ autograd API changes
May 6, 2026
0b1645b
Fix ccache and pip cache paths: use absolute paths instead of ~
May 6, 2026
61695b3
Fix VariableFallbackKernel.cpp: use PyTorch version macros instead of…
May 6, 2026
dfbc2c8
Merge remote-tracking branch 'upstream/master' into dev_master
May 7, 2026
9f582ba
Revert torch_npu/ and torchnpugen/autograd/templates/ to upstream/mas…
May 7, 2026
7eb3e7a
Build PyTorch from specific commit fccc94ae83f61fe26559abc99979729719…
May 7, 2026
3c8db92
Fix test source directory path in _test.yml
May 7, 2026
ab408c0
Package PyTorch source with build artifacts for testing
May 7, 2026
5089254
Move PR trigger from npu-sync-test.yml to test-collect.yml
May 7, 2026
ca08d31
Use fixed workflow run for test-collect.yml artifacts
May 7, 2026
4d32282
Fix issues in test-collect.yml
May 7, 2026
290dd71
Update workflow run ID to 25473829132 for artifact downloads
May 7, 2026
42cf359
Add debug logging for test case collection path issues
May 7, 2026
92c456c
Change PyTorch source download to use GitHub commit with proxy
May 7, 2026
3648c3d
Fix pytest path argument for test case collection
May 7, 2026
7c288d8
Simplify test-collect.yml and add timing information
May 7, 2026
2103872
Increase test execution timeout to 20 hours
May 7, 2026
5785218
Remove debug step for pytorch-src directory structure
May 7, 2026
d9d7db2
Implement matrix sharding strategy for test execution
May 7, 2026
6581d05
Fix test collection: use discover_tests.py instead of dry-run
May 7, 2026
cf2d3f5
Implement manual sharding: split test list into 6 JSON shards
May 7, 2026
ab20955
Fix shard JSON path: read before cd to test directory
May 7, 2026
951f6e8
Add timeout control and crash detection for test execution
May 8, 2026
85379e1
Remove single test timeout, keep crash detection
May 8, 2026
3da7894
Remove PR trigger from test-collect.yml
May 8, 2026
867eb10
Add PR trigger to npu-sync-test.yml
May 8, 2026
3e4e5d6
Add test progress logging in run_npu_test_shard.py
May 8, 2026
736176e
Simplify test results JSON: only keep case_id, duration, status
May 8, 2026
e17c62f
Restore full results JSON with all fields
May 8, 2026
6f8e0a6
Simplify workflow parameters: remove pytorch_version and docker_image…
May 9, 2026
3afc0d2
Remove deprecated workflow files and documentation
May 9, 2026
95ecec6
Remove remaining deprecated files
May 9, 2026
e41da75
Checkout torch_npu from current repository instead of hardcoded kerer…
May 9, 2026
0d3cd4a
调整依赖下载顺序
May 9, 2026
346ac08
Checkout from current repository in all test workflows
May 9, 2026
1783029
Checkout torch_npu from Ascend/pytorch upstream repository
May 9, 2026
018fa44
调整依赖下载顺序
May 9, 2026
bf63cb4
Add PyPI cache proxy configuration for faster pip downloads
May 9, 2026
1225b7f
删除黑白名单限制
May 10, 2026
5f11d06
优化测试矩阵任务名称显示
May 10, 2026
fe7eebb
修改超时时间为60s
May 10, 2026
d7ed26d
修复报告bug
May 11, 2026
0c902d6
修复报告bug
May 11, 2026
f82e3bb
精简报告json
May 11, 2026
48bbe18
删除处理测试xml逻辑
May 11, 2026
6951828
修正用例执行
May 11, 2026
aed81cc
修改用例xml文件命名
May 11, 2026
0f9f664
修改dis用例分片数3
May 11, 2026
601e622
同步2.7.1分支修改:xml文件命名特殊字符bug;xml文件预压缩成tar.gz;报告删除crash补充skip状态;执行脚本新增qu…
May 12, 2026
e7daf2e
添加测试依赖缓存和黑名单配置优化
May 12, 2026
a64c107
删除case-paths-config参数及相关冗余步骤
May 13, 2026
f4f3e37
s
May 13, 2026
84e198a
重构workflow:抽取公共依赖安装为action并优化日志
May 14, 2026
d884663
修复pytest nodeid收集逻辑:过滤非测试用例符号
May 14, 2026
3606055
简化pytest nodeid解析逻辑:利用-q模式的标准输出格式
May 14, 2026
936aa38
重构run_npu_test_shard.py:删除重复的用例收集逻辑
May 14, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 50 additions & 6 deletions .ci/docker/requirements-ci.txt
Original file line number Diff line number Diff line change
@@ -1,7 +1,51 @@
# Python dependencies required for unit tests
# Python dependencies required for NPU tests
# Based on upstream PyTorch .ci/docker/requirements-ci.txt

mypy==1.9.0
# Pin MyPy version because new errors are likely to appear with each release
#Description: linter
#Pinned versions: 1.9.0
#test that import: test_typing.py, test_type_hints.py
# pytest and plugins
pytest==7.3.2
pytest-xdist==3.3.1
pytest-flakefinder==1.1.0
pytest-rerunfailures>=10.3
pytest-subtests==0.13.1
pytest-timeout>=2.3.1
xdoctest==1.3.0

# test utilities
hypothesis==6.56.4
expecttest==0.3.0
parameterized==0.8.1

# numpy (version per Python version)
numpy==1.26.2; python_version >= "3.11" and python_version < "3.14"

# scientific packages
scipy==1.14.1; python_version > "3.11" and python_version < "3.14"
scikit-image==0.22.0
pillow==12.1.1
pywavelets==1.7.0; python_version >= "3.12"

# core utilities
networkx==2.8.8
optree==0.13.0; python_version < "3.14"
opt-einsum==3.3
filelock==3.20.3
sympy==1.13.3

# build/serialization
pyyaml==6.0.3
packaging==24.0
typing-extensions==4.12.2; python_version < "3.14"
pyzstd
setuptools>=70.1.0,<82
zstandard

# ONNX support
onnx==1.20.0
onnxscript==0.6.2
protobuf==6.33.5

# misc
psutil
jinja2==3.1.6
tqdm>=4.66.0
click
146 changes: 146 additions & 0 deletions .github/actions/setup-npu-test-env/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
name: 'Setup NPU Test Environment'
description: 'Common environment setup for NPU upstream tests - checkout, cache, install PyTorch/torch_npu/triton-ascend, test dependencies'

inputs:
python_version:
required: true
type: string
description: Python version to use
torch_wheel_artifact:
required: true
type: string
description: Name of the torch wheel artifact
torch_npu_wheel_artifact:
required: true
type: string
description: Name of the torch_npu wheel artifact
pytorch_src_artifact:
required: true
type: string
description: Name of the PyTorch source artifact

env:
# PyPI 缓存 URL(用于加速 pip 下载)
PYPI_CACHE_URL: 'http://cache-service.nginx-pypi-cache.svc.cluster.local/pypi/simple'

runs:
using: 'composite'
steps:
- name: Checkout repository
uses: actions/checkout@v4
with:
repository: ${{ github.repository }}
ref: ${{ github.ref }}
fetch-depth: 1
path: ascend_pytorch

- name: Setup cache directories
run: |
mkdir -p /github/home/.cache/pip
chmod -R 777 /github/home/.cache

- name: Cache pip
uses: actions/cache@v4
with:
path: /github/home/.cache/pip
key: pip-arm-collect-py${{ inputs.python_version }}
restore-keys: |
pip-arm-collect-py${{ inputs.python_version }}-
pip-arm-collect-

- name: Download built torch wheel
uses: actions/download-artifact@v4
with:
name: ${{ inputs.torch_wheel_artifact }}
path: torch-wheel-artifact

- name: Download built torch_npu wheel
uses: actions/download-artifact@v4
with:
name: ${{ inputs.torch_npu_wheel_artifact }}
path: torch-npu-wheel-artifact

- name: Download PyTorch source and test code
uses: actions/download-artifact@v4
with:
name: ${{ inputs.pytorch_src_artifact }}
path: pytorch-src-artifact

- name: Extract PyTorch source
run: |
tar -xzf pytorch-src-artifact/pytorch-src.tar.gz

- name: Install built PyTorch and torch_npu
run: |
source /usr/local/Ascend/cann/set_env.sh 2>/dev/null || true
source /usr/local/Ascend/nnal/atb/set_env.sh 2>/dev/null || true

PIP=pip${{ inputs.python_version }}
PYTHON=python${{ inputs.python_version }}
export PIP_CACHE_DIR=/github/home/.cache/pip

# Configure pip to use PyPI cache for faster downloads
if [ -n "${{ env.PYPI_CACHE_URL }}" ]; then
$PIP config set global.index-url ${{ env.PYPI_CACHE_URL }}
$PIP config set global.trusted-host "cache-service.nginx-pypi-cache.svc.cluster.local"
echo "pip index-url configured: ${{ env.PYPI_CACHE_URL }}"
fi

$PIP install --upgrade pip

# Install built torch wheel
TORCH_WHL=$(ls torch-wheel-artifact/*.whl | head -1)
$PIP install "${TORCH_WHL}"

# Install built torch_npu wheel
TORCH_NPU_WHL=$(ls torch-npu-wheel-artifact/*.whl | head -1)
$PIP install "${TORCH_NPU_WHL}"

echo "Installed PyTorch and torch_npu from built wheels"
echo "torch: ${TORCH_WHL}"
echo "torch_npu: ${TORCH_NPU_WHL}"

- name: Install test dependencies
run: |
PIP=pip${{ inputs.python_version }}
export PIP_CACHE_DIR=/github/home/.cache/pip
cd pytorch-src

# Core test dependencies
$PIP install pytest pytest-timeout pytest-xdist hypothesis zstandard pyyaml
$PIP install pytest-rerunfailures pytest-flakefinder
$PIP install 'pytest-subtests==0.13.1' 'xdoctest==1.1.0' 'pulp>=2.9'

# Optional dependencies for ONNX tests
# These are not in PyTorch requirements.txt but needed by specific tests
$PIP install onnxruntime onnxscript onnx-ir ml-dtypes || true

# torchvision for ONNX model tests (install without deps to bypass torch version check)
# PyPI torchvision requires exact torch version (torch==2.11.0), but we have dev build
# Use --no-deps to skip torch dependency, we already have our compiled torch installed
$PIP install numpy pillow || true
$PIP install torchvision --no-deps || true

# Other optional dependencies
$PIP install parameterized pandas || true
$PIP install opencv-python || true

# PyTorch requirements (if exists)
if [ -f requirements.txt ]; then
$PIP install -r requirements.txt || true
fi

- name: Verify NPU availability
run: |
source /usr/local/Ascend/cann/set_env.sh 2>/dev/null || true
source /usr/local/Ascend/nnal/atb/set_env.sh 2>/dev/null || true

PYTHON=python${{ inputs.python_version }}
$PYTHON -c "
import torch
print(f'torch: {torch.__version__}')
import torch_npu
print(f'torch_npu: {torch_npu.__version__}')
print(f'NPU available: {torch.npu.is_available()}')
print(f'NPU count: {torch.npu.device_count()}')
"
152 changes: 152 additions & 0 deletions .github/docker/pytorch-npu-builder.Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,152 @@
# 基于 PyPA manylinux 2_28 aarch64 镜像 (与 PyTorch 主干一致)
FROM quay.io/pypa/manylinux_2_28_aarch64

ARG GCCTOOLSET_VERSION=13

# CANN 包下载 URL(通过 build-arg 传入)
ARG CANN_TOOLKIT_URL
ARG CANN_A3OPS_URL
ARG CANN_NNAL_URL
ARG CANN_VERSION

# Language variables
ENV LC_ALL=en_US.UTF-8
ENV LANG=en_US.UTF-8
ENV LANGUAGE=en_US.UTF-8

# 安装必要的 OS 包 (与 PyTorch 官方 Dockerfile 一致)
RUN yum -y install epel-release && \
yum -y update && \
yum install -y \
autoconf \
automake \
bison \
bzip2 \
curl \
diffutils \
file \
git \
less \
libffi-devel \
libgomp \
make \
openssl-devel \
patch \
perl \
unzip \
util-linux \
wget \
which \
xz \
yasm \
zstd \
sudo \
gcc-toolset-${GCCTOOLSET_VERSION}-gcc \
gcc-toolset-${GCCTOOLSET_VERSION}-gcc-c++ \
gcc-toolset-${GCCTOOLSET_VERSION}-gcc-gfortran \
gcc-toolset-${GCCTOOLSET_VERSION}-gdb && \
yum install -y --enablerepo=powertools ninja-build && \
rm -rf /var/cache/yum

# 确保使用正确的 devtoolset
ENV PATH=/opt/rh/gcc-toolset-${GCCTOOLSET_VERSION}/root/usr/bin:$PATH
ENV LD_LIBRARY_PATH=/opt/rh/gcc-toolset-${GCCTOOLSET_VERSION}/root/usr/lib64:/opt/rh/gcc-toolset-${GCCTOOLSET_VERSION}/root/usr/lib:$LD_LIBRARY_PATH

# git 2.36+ 需要配置 safe.directory
RUN git config --global --add safe.directory "*"

# ============================================================
# 预装所有 Python 版本(镜像支持多 Python 版本)
# ============================================================
# manylinux 镜像已包含 cp310-cp310, cp311-cp311, cp312-cp312, cp313-cp313
# 默认使用 Python 3.11(可通过环境变量切换)

ENV DEFAULT_PYTHON_VERSION=3.11
ENV PATH=/opt/python/cp311-cp311/bin:$PATH

# 创建 Python 版本切换脚本
RUN printf '#!/bin/bash\n\
# Python 版本切换辅助脚本\n\
# 使用方法: source /usr/local/bin/switch_python.sh 3.11\n\
\n\
PYTHON_VERSION="${1:-3.11}"\n\
\n\
case "$PYTHON_VERSION" in\n\
3.10) PYTHON_DIR="cp310-cp310" ;;\n\
3.11) PYTHON_DIR="cp311-cp311" ;;\n\
3.12) PYTHON_DIR="cp312-cp312" ;;\n\
3.13) PYTHON_DIR="cp313-cp313" ;;\n\
*) echo "Unsupported Python version: $PYTHON_VERSION"; return 1 ;;\n\
esac\n\
\n\
export PATH=/opt/python/$PYTHON_DIR/bin:$PATH\n\
echo "Switched to Python $PYTHON_VERSION ($(python --version))"\n\
' > /usr/local/bin/switch_python.sh && \
chmod +x /usr/local/bin/switch_python.sh

# 为每个 Python 版本安装常用包
RUN for py_dir in cp310-cp310 cp311-cp311 cp312-cp312 cp313-cp313; do \
/opt/python/$py_dir/bin/pip install --upgrade pip setuptools wheel; \
done

# ============================================================
# 安装 CANN(使用传入的 URL)
# ============================================================

WORKDIR /root

RUN mkdir -p cann && cd cann && \
curl -O "${CANN_TOOLKIT_URL}" && \
curl -O "${CANN_A3OPS_URL}" && \
curl -O "${CANN_NNAL_URL}" && \
chmod +x Ascend-cann*.run && \
./Ascend-cann-toolkit*.run --full --quiet --install-path=/usr/local/Ascend && \
./Ascend-cann-A3*.run --install --quiet --install-path=/usr/local/Ascend && \
source /usr/local/Ascend/cann/set_env.sh && \
./Ascend-cann-nnal*.run --install --quiet --install-path=/usr/local/Ascend && \
rm -rf cann

# 设置环境变量
ENV CANN_PATH=/usr/local/Ascend/cann
ENV NNAL_PATH=/usr/local/Ascend/nnal
ENV ASCEND_HOME=/usr/local/Ascend
ENV CANN_VERSION=${CANN_VERSION}

# 添加 CANN 环境初始化脚本
RUN printf '#!/bin/bash\n\
source /usr/local/Ascend/cann/set_env.sh 2>/dev/null || true\n\
source /usr/local/Ascend/nnal/atb/set_env.sh 2>/dev/null || true\n\
' > /etc/profile.d/cann_env.sh && \
chmod +x /etc/profile.d/cann_env.sh

# ============================================================
# 预安装 pytest 等测试依赖(为所有 Python 版本)
# ============================================================

RUN for py_dir in cp310-cp310 cp311-cp311 cp312-cp312 cp313-cp313; do \
/opt/python/$py_dir/bin/pip install pytest pytest-timeout pytest-xdist hypothesis pyyaml zstandard cmake ninja; \
done

# ============================================================
# 设置工作目录和默认命令
# ============================================================

WORKDIR /workspace

# 创建 welcome 消息
RUN printf '\n\
========================================\n\
PyTorch NPU Builder Image\n\
========================================\n\
CANN Version: %s\n\
Python Versions: 3.10, 3.11, 3.12, 3.13 (default: 3.11)\n\
\n\
To switch Python version:\n\
source /usr/local/bin/switch_python.sh 3.12\n\
\n\
To setup CANN environment:\n\
source /etc/profile.d/cann_env.sh\n\
========================================\n\
\n' "${CANN_VERSION}" > /etc/motd

CMD ["bash"]
Loading
Loading