Explicit cleanup by hubert-marek · Pull Request #2756 · PrimeIntellect-ai/prime-rl

hubert-marek · 2026-06-10T08:26:07Z

No description provided.

The aarch64 host install path was broken: `uv sync` installs flash-attn from PyPI source but pyproject sets FLASH_ATTENTION_SKIP_CUDA_BUILD=TRUE, so the compiled extension never builds. `scripts/docker-arm64-post-install.sh` fixed it for Docker GB200 builds but hardcoded sm_100 and /app/.venv, leaving Hopper hosts (H100/H200/GH200) without a recipe. Changes: - `scripts/docker-arm64-post-install.sh`: auto-detect compute capability via nvidia-smi when available; parameterize venv path. Preserves the sm_100 default when no GPU is visible (Docker buildx). - `scripts/install.sh`: call the post-install for aarch64 hosts after `uv sync --all-extras`. Previously the script ran uv sync and exited, leaving aarch64 users with a broken venv. - `README.md`: document the aarch64 post-install step (mirrors the existing 3.1 Flash Attention 3 pattern). Validated on GH200 (sm_90, aarch64): - forward + backward parity vs torch SDPA (max diff < 0.05 / 0.25) - 383/384 unit tests pass (the 1 failure is unrelated TileLang/MoE) - SFT trainer smoke test (5 steps, Qwen3-0.6B) runs with flash_attention_2 Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(fp8): fuse transpose into the dX block-fp8 weight cast The dX backward built `weight.transpose(0, 1).contiguous()` and re-cast it to fp8 every step, materializing a full bf16 transpose buffer plus an extra read/write pass. Add `per_block_cast_to_fp8_tp_triton`, which produces the block-fp8 of `weight.T` directly by reusing the existing per-block kernel with swapped output/scale strides — no intermediate buffer. 128x128 block quantization is transpose-symmetric, so the result is bit-identical to casting the materialized transpose; DeepGEMM receives an identical B tensor. Verified byte-for-byte across shapes; ~14x faster on a 4096x4096 weight (373 -> 27 us). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * deslopified * also add fused implementation for per-token * Fix: skip tests on <Hopper --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: S1ro1 <matej.sirovatka@gmail.com>

… source (#2733) The [tool.uv.sources] override for prime-pydantic-config was being ignored because it was only a transitive dependency (via prime-rl-configs). uv only applies source overrides for packages that appear in project.dependencies. Adding it as a direct dependency makes uv resolve from the local editable path (deps/pydantic-config) instead of PyPI.

* orch improvements * fixes

Remove the configs/private submodule (research-configs) and all references to it throughout the codebase: - Remove submodule from .gitmodules and git tracking - Simplify install.sh: use plain git submodule update --init --recursive now that no private submodule can fail for users without access - Update skills/install/SKILL.md to reflect simplified submodule init - Remove configs/private/ entry from skills/configs/SKILL.md key files - Simplify test_configs.py: no longer need to filter out private/ path

* update deps * update deps * update deps

andre-fu and others added 12 commits June 5, 2026 20:21

Feat: fix weight reload to cpu optim (#2729)

90b0744

chore(renderers): bump to submodule to renderers-v0.1.8.dev41 (#2732)

c2a5fa4

orch improvements (#2725)

54012df

* orch improvements * fixes

feat(orchestrator): per-env advantage strategy (#2721)

e0f8a35

fix: allow sft without teacher (#2720)

0695f9c

add router replay to latentMoE models (#2738)

c8759c3

update deps (#2736)

04d0671

* update deps * update deps * update deps

explicit del and malloc

77b8567

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explicit cleanup#2756

Explicit cleanup#2756
hubert-marek wants to merge 12 commits into
feat/ephemeral-mm-pixelsfrom
explicit-cleanup

hubert-marek commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

Conversation

hubert-marek commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants