Skip to content
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 25 additions & 22 deletions models/artifacts/artifacts-walkthrough.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -2,46 +2,47 @@
description:
Create, track, and use a dataset artifact with W&B.
title: "Tutorial: Create, track, and use a dataset artifact"
keywords: ["artifact", "dataset versioning", "log_artifact", "use_artifact", "wandb.Artifact"]
---

This walkthrough demonstrates how to create, track, and use a dataset artifact.
This walkthrough demonstrates how to create, track, and use a dataset artifact with W&B. By the end, you've logged a dataset as a versioned artifact to W&B and downloaded it in a subsequent run. This lets you reproducibly share datasets across experiments and track them as inputs and outputs of your runs.

## 1. Log into W&B
## Log in to W&B

Import the W&B library and log in to W&B. You will need to sign up for a free W&B account if you have not done so already.
Import the W&B library and log in to W&B. If you haven't done so already, sign up for a free W&B account.

```python
import wandb

wandb.login()
```

## 2. Initialize a run
## Initialize a run

Use [`wandb.init()`](/models/ref/python/functions/init) to initialize a run. This generates a background process to sync and log data. Provide a project name and a job type:

```python
# Create a W&B Run. Here we specify 'dataset' as the job type since this example
# Create a W&B Run. Here you specify 'dataset' as the job type since this example
# shows how to create a dataset artifact.
with wandb.init(project="artifacts-example", job_type="upload-dataset") as run:
# Your code here
```

## 3. Create an artifact object
## Create an artifact object

Create an artifact object with the [`wandb.Artifact()`](/models/ref/python/experiments/artifact). Provide a name for the artifact and a description of the file type for the `name` and `type` parameters, respectively.
Create an artifact object with [`wandb.Artifact()`](/models/ref/python/experiments/artifact). Provide a name for the artifact and a description of the file type for the `name` and `type` parameters, respectively.

For example, the following code snippet demonstrates how to create an artifact called `bicycle-dataset` with a `dataset` label:
For example, the following code snippet demonstrates how to create an artifact called `'bicycle-dataset'` with a `'dataset'` label:

```python
artifact = wandb.Artifact(name="bicycle-dataset", type="dataset")
```

For more information about how to construct an artifact, see [Construct artifacts](./construct-an-artifact).

## 4. Add the dataset to the artifact
## Add the dataset to the artifact

Add a file to the artifact. Common file types include models and datasets. The following example adds a dataset named `dataset.h5` that is saved locally on our machine to the artifact:
Add a file to the artifact. Common file types include models and datasets. The following example adds a dataset named `dataset.h5` that is saved locally on your machine to the artifact:

```python
# Add a file to the artifact's contents
Expand All @@ -51,20 +52,20 @@ artifact.add_file(local_path="dataset.h5")
Replace the filename `dataset.h5` in the previous code snippet with the path to the file you want to add to the artifact.


## 5. Log the dataset
## Log the dataset

Use the W&B run objects `wandb.Run.log_artifact()` method to both save your artifact version and declare the artifact as an [output of the run](/models/artifacts/explore-and-traverse-an-artifact-graph).
Use the W&B run object's `wandb.Run.log_artifact()` method to both save your artifact version and declare the artifact as an [output of the run](/models/artifacts/explore-and-traverse-an-artifact-graph).

```python
# Save the artifact version to W&B and mark it
# as the output of this run
run.log_artifact(artifact)
```

A `'latest'` [alias](/models/artifacts/create-a-custom-alias) is created by default when you log an artifact. For more information about artifact aliases and versions, see [Create a custom alias](./create-a-custom-alias) and [Create new artifact versions](./create-a-new-artifact-version), respectively.
When you log an artifact, W&B creates a `'latest'` [alias](/models/artifacts/create-a-custom-alias) by default. For more information about artifact aliases and versions, see [Create a custom alias](./create-a-custom-alias) and [Create new artifact versions](./create-a-new-artifact-version), respectively.


Putting this together, you script so far should look like this:
Putting this together, your script so far should look like this:

```python
import wandb
Expand All @@ -78,17 +79,17 @@ with wandb.init(project="artifacts-example", job_type="upload-dataset") as run:
```


## 6. Download and use the artifact
## Download and use the artifact

The following code example demonstrates the steps you can take to use an artifact you have logged and saved to the W&B servers.
Now that the dataset is logged as an artifact, you can pull it into other runs as a tracked input. The following code example demonstrates the steps you can take to use an artifact you've logged and saved to the W&B servers:

1. First, initialize a new run object with **`wandb.init()`.**
2. Second, use the run objects [`wandb.Run.use_artifact()`](/models/ref/python/experiments/run#use_artifact) method to tell W&B what artifact to use. This returns an artifact object.
3. Third, use the artifacts [`wandb.Artifact.download()`](/models/ref/python/experiments/artifact#download) method to download the contents of the artifact.
1. Initialize a new run object with `wandb.init()`.
2. Use the run object's [`wandb.Run.use_artifact()`](/models/ref/python/experiments/run#use_artifact) method to specify which artifact to use. This returns an artifact object.
3. Use the artifact's [`wandb.Artifact.download()`](/models/ref/python/experiments/artifact#download) method to download the contents of the artifact.

```python
# Create a W&B Run. Here we specify 'training' for 'type'
# because we will use this run to track training.
# Create a W&B Run. Here you specify 'training' for 'type'
# because you use this run to track training.
with wandb.init(project="artifacts-example", job_type="training") as run:

# Query W&B for an artifact and mark it as input to this run
Expand All @@ -98,4 +99,6 @@ with wandb.init(project="artifacts-example", job_type="training") as run:
artifact_dir = artifact.download()
```

Alternatively, you can use the Public API (`wandb.Api`) to export (or update data) data already saved in a W&B outside of a Run. See [Track external files](./track-external-files) for more information.
Alternatively, you can use the Public API (`wandb.Api`) to export or update data already saved in W&B outside of a run. For more information, see [Track external files](./track-external-files).

You now have a versioned dataset artifact logged to W&B and consumed by a downstream run. The artifact graph tracks both the upload and the download.
Loading
Loading