From d65b765d7280befc017ba91549246e7d10564a18 Mon Sep 17 00:00:00 2001 From: Daniel van Strien Date: Thu, 4 Jun 2026 09:51:33 +0100 Subject: [PATCH 1/3] =?UTF-8?q?docs:=20surface=20Transformers=20Trainer=20?= =?UTF-8?q?=E2=86=92=20HF=20Buckets?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add a link-forward mention of `Trainer`'s native HF Buckets checkpoint support (transformers#46386) to the buckets docs: - storage-buckets.md: Tip in the "Training checkpoints and logs" use case - storage-buckets-integrations.md: new "🤗 Transformers" section Co-Authored-By: Claude Opus 4.8 (1M context) --- docs/hub/storage-buckets-integrations.md | 4 ++++ docs/hub/storage-buckets.md | 3 +++ 2 files changed, 7 insertions(+) diff --git a/docs/hub/storage-buckets-integrations.md b/docs/hub/storage-buckets-integrations.md index b6c322ae2e..e070e11095 100644 --- a/docs/hub/storage-buckets-integrations.md +++ b/docs/hub/storage-buckets-integrations.md @@ -51,6 +51,10 @@ from datasets import load_dataset ds = load_dataset("buckets/username/my-bucket", data_files=["data.parquet"]) ``` +## 🤗 Transformers + +The [`Trainer`](https://huggingface.co/docs/transformers/trainer_recipes) can push and resume training checkpoints directly to a bucket, so a run can resume on a fresh machine without keeping checkpoints in a Git repo. See the [Trainer checkpointing docs](https://huggingface.co/docs/transformers/trainer_recipes) for setup. + ## Filesystem operations For direct file operations, `huggingface_hub` exposes a pre-instantiated [filesystem object](/docs/huggingface_hub/guides/hf_file_system), `hffs`: diff --git a/docs/hub/storage-buckets.md b/docs/hub/storage-buckets.md index d367d5305e..29bd50fbce 100644 --- a/docs/hub/storage-buckets.md +++ b/docs/hub/storage-buckets.md @@ -334,6 +334,9 @@ hf sync ./checkpoints hf://buckets/my-org/training-run-42/checkpoints Because buckets are built on [Xet](./xet/index), successive checkpoints where large parts of the model are frozen benefit from chunk-level deduplication. Only the changed chunks are uploaded. +> [!TIP] +> 🤗 Transformers' [`Trainer`](https://huggingface.co/docs/transformers/trainer_recipes) can push and resume training checkpoints directly to a bucket — no manual `sync` step needed. See the Trainer docs for setup. + ### Data processing pipelines Buckets serve as staging areas for data processing workflows. Process raw data, write intermediate outputs to a bucket, then promote the final artifact to a versioned [Dataset](./datasets) repository when the pipeline completes. This keeps your versioned repo clean while giving your pipeline fast mutable storage. From 4f8eb721399a3fa6a9aa3d13927eda7eea79ae65 Mon Sep 17 00:00:00 2001 From: Daniel van Strien Date: Sun, 7 Jun 2026 12:28:22 +0100 Subject: [PATCH 2/3] Update docs/hub/storage-buckets-integrations.md Co-authored-by: Julien Chaumond --- docs/hub/storage-buckets-integrations.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/hub/storage-buckets-integrations.md b/docs/hub/storage-buckets-integrations.md index e070e11095..fd789c3d92 100644 --- a/docs/hub/storage-buckets-integrations.md +++ b/docs/hub/storage-buckets-integrations.md @@ -51,7 +51,7 @@ from datasets import load_dataset ds = load_dataset("buckets/username/my-bucket", data_files=["data.parquet"]) ``` -## 🤗 Transformers +## Transformers The [`Trainer`](https://huggingface.co/docs/transformers/trainer_recipes) can push and resume training checkpoints directly to a bucket, so a run can resume on a fresh machine without keeping checkpoints in a Git repo. See the [Trainer checkpointing docs](https://huggingface.co/docs/transformers/trainer_recipes) for setup. From ee8663168537db66de2621fb23047c3f16c9d25f Mon Sep 17 00:00:00 2001 From: Daniel van Strien Date: Sun, 7 Jun 2026 12:28:33 +0100 Subject: [PATCH 3/3] Update docs/hub/storage-buckets.md Co-authored-by: Julien Chaumond --- docs/hub/storage-buckets.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/hub/storage-buckets.md b/docs/hub/storage-buckets.md index 29bd50fbce..b1798e4f34 100644 --- a/docs/hub/storage-buckets.md +++ b/docs/hub/storage-buckets.md @@ -335,7 +335,7 @@ hf sync ./checkpoints hf://buckets/my-org/training-run-42/checkpoints Because buckets are built on [Xet](./xet/index), successive checkpoints where large parts of the model are frozen benefit from chunk-level deduplication. Only the changed chunks are uploaded. > [!TIP] -> 🤗 Transformers' [`Trainer`](https://huggingface.co/docs/transformers/trainer_recipes) can push and resume training checkpoints directly to a bucket — no manual `sync` step needed. See the Trainer docs for setup. +> Transformers' [`Trainer`](https://huggingface.co/docs/transformers/trainer_recipes) can push and resume training checkpoints directly to a bucket — no manual `sync` step needed. See the Trainer docs for setup. ### Data processing pipelines