Skip to content
Open
Show file tree
Hide file tree
Changes from 6 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
7665ae4
test new tf on axlearn
Steboss May 20, 2026
5f61a45
empty commit for signed
Steboss May 20, 2026
e6dab45
empty commit for signed
Steboss May 20, 2026
bac0bf6
Merge branch 'main' into sbosisio/test-axlearn-new-tf
Steboss May 20, 2026
b8dda6c
avoid saving checkpoints
Steboss May 20, 2026
9523b6d
Merge branch 'sbosisio/test-axlearn-new-tf' of github.com:NVIDIA/JAX-…
Steboss May 20, 2026
c2a0937
remove maxtext eks ofi
Steboss May 20, 2026
eaabab0
Merge branch 'main' into sbosisio/test-axlearn-new-tf
aybchan May 21, 2026
1ccd1e0
use nightly tf to accommodate changes
Steboss May 29, 2026
7b0d42b
Merge branch 'sbosisio/test-axlearn-new-tf' of github.com:NVIDIA/JAX-…
Steboss May 29, 2026
c56948a
fix text-nightly
Steboss May 29, 2026
f3f2ae0
test with protobuf
Steboss Jun 9, 2026
8cefe20
Merge branch 'main' into sbosisio/test-axlearn-new-tf
Steboss Jun 9, 2026
473c130
Merge branch 'sbosisio/test-axlearn-new-tf' of github.com:NVIDIA/JAX-…
Steboss Jun 9, 2026
0e07e2f
try to build with nightlies
Steboss Jun 9, 2026
e376b7a
fix version of nightly
Steboss Jun 10, 2026
71cb8bc
test with better options in pip-finalize
Steboss Jun 10, 2026
16ab71d
can we check efa?
Steboss Jun 10, 2026
9579fca
can we change pip finalize
Steboss Jun 10, 2026
b3a0666
try to remove tnesorflow completely
Steboss Jun 10, 2026
7e90ea5
fix tf-nightly and install it
Steboss Jun 10, 2026
b3595d1
rebuild base
Steboss Jun 10, 2026
968a62d
correct errorin pip-finalize
Steboss Jun 11, 2026
0663fa9
try to use nightlies for tensorflow text metadata and datasets
Steboss Jun 11, 2026
bcc45cd
fix tf libs
Steboss Jun 11, 2026
f9c9dc0
let seqio free
Steboss Jun 12, 2026
389f2fa
update packages
Steboss Jun 15, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/container/Dockerfile.axlearn
Original file line number Diff line number Diff line change
Expand Up @@ -47,8 +47,8 @@ RUN <<"EOF" bash -ex

# Add packages missing from pyproject.toml
cat <<REQUIREMENTS >> /opt/pip-tools.d/requirements-axlearn.in
tensorflow==2.20.0
tensorflow-text==2.20.0
tensorflow
tensorflow-text
pyarrow
tensorflow-metadata
tensorstore
Expand Down
4 changes: 1 addition & 3 deletions .github/workflows/_ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -645,15 +645,13 @@ jobs:
NAME: axlearn-fuji-3b
NUM_GPUS: 8
IMAGE: ${{ needs.build-axlearn.outputs.DOCKER_TAG_FINAL }}
ENVS: |
OFI_NCCL_PROTOCOL=SENDRECV
COMMAND: |
CONFIG=fuji-3B-v3-flash;
LOG_DIR=/opt/output;
TRAINER_DIR=\${LOG_DIR}/\${CONFIG}-eks/trainer-dir;
mkdir -p \${TRAINER_DIR};
OUTPUT_LOG_FILE=\${TRAINER_DIR}/output.log;
python3 /usr/local/bin/fuji-train-perf.py --module=text.gpt.c4_trainer --config=\${CONFIG} --jax_backend=gpu --trainer_dir=\${TRAINER_DIR} --data_dir=gs://axlearn-public/tensorflow_datasets --ici_fsdp=8 --dcn_dp=1 --gbs=16 --ga=1 --seq_len=4096 --max_step=301 --save_checkpoint_steps=100 --write_summary_steps=100 --output_log_file=\${OUTPUT_LOG_FILE} --world_size=8
python3 /usr/local/bin/fuji-train-perf.py --module=text.gpt.c4_trainer --config=\${CONFIG} --jax_backend=gpu --trainer_dir=\${TRAINER_DIR} --data_dir=gs://axlearn-public/tensorflow_datasets --ici_fsdp=8 --dcn_dp=1 --gbs=16 --ga=1 --seq_len=4096 --max_step=301 --save_checkpoint_steps=1001 --write_summary_steps=100 --output_log_file=\${OUTPUT_LOG_FILE} --world_size=8
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
NVCR_TOKEN: ${{ secrets.NVCR_TOKEN }}

Expand Down
Loading