Skip to content

Add Seqera NIO filesystem for datasets and refactor TowerClient/TowerObserver#6946

Merged
bentsherman merged 1 commit intomasterfrom
260310-seqera-dataset-fs
Apr 21, 2026
Merged

Add Seqera NIO filesystem for datasets and refactor TowerClient/TowerObserver#6946
bentsherman merged 1 commit intomasterfrom
260310-seqera-dataset-fs

Conversation

@jorgee
Copy link
Copy Markdown
Contributor

@jorgee jorgee commented Mar 19, 2026

Summary

  • Implements a seqera:// NIO FileSystemProvider in nf-tower, enabling Nextflow pipelines to reference Seqera Platform datasets as standard file paths (e.g. seqera://org/workspace/datasets/name)
  • Path hierarchy: root → org → workspace → resource type → dataset file (with optional @version pinning)
  • Refactors TowerClient into two classes: TowerClient (pure HTTP API client) and TowerObserver (workflow telemetry via TraceObserverV2), so the API client can be reused by the new filesystem without pulling in observer lifecycle. This is done to allow using the FS without requiring to create an observer.
  • Merges TowerCommonApi into TowerClient, which is now the natural home for shared API methods
  • Registers via META-INF/services/java.nio.file.spi.FileSystemProvider

New files

File Purpose
TowerObserver Extracted TraceObserverV2 implementation (task events, heartbeats, workflow lifecycle) formerly in TowerClient
dataset/SeqeraDatasetClient API calls: list orgs/workspaces/datasets/versions, download
fs/SeqeraPath Path implementation with 0–4 depth hierarchy
fs/SeqeraFileSystem FileSystem with lazy org/workspace/dataset caches
fs/SeqeraFileSystemProvider FileSystemProvider SPI: read, write, list, attributes, copy
fs/SeqeraFileAttributes BasicFileAttributes backed by dataset metadata
fs/SeqeraPathFactory Nextflow PathFactory integration
fs/ResourceTypeHandler, fs/DatasetsResourceHandler Extensibility interface for future resource types
fs/DatasetInputStream, fs/DatasetOutputStream Stream wrappers for dataset read/write
exception/ForbiddenException, exception/NotFoundException HTTP error types for API responses

Changes to existing files

File Change
TowerClient Stripped of observer logic; now a pure API client. Added public sendApiRequest() + GET support in makeRequest(). Absorbed TowerCommonApi methods.
TowerCommonApi (deleted) Methods merged into TowerClient
TowerFactory Creates TowerObserver and TowerClient separately. client() now also activates when accessToken is present, so seqera:// paths work without tower.enabled
TowerPlugin Registers SeqeraPathFactory
BaseCommandImpl, AuthCommandImpl, LaunchCommandImpl Updated to use the refactored TowerClient API
Tests Split accordingly: TowerClientTest for API client, TowerObserverTest for observer; new tests for all fs/ and dataset/ classes

Test plan

  • SeqeraPathTest — path parsing, URI round-trips, relativize/resolve, getFileName, asUri
  • SeqeraFileSystemTest — cache loading, workspace/dataset resolution, thread safety
  • SeqeraFileSystemProviderTest — newInputStream (latest + pinned version), readAttributes, newDirectoryStream, error propagation
  • SeqeraDatasetClientTest — API URL construction, response mapping, error handling
  • TowerObserverTest — extracted observer logic works identically
  • TowerClientTest — API client methods after refactor
./gradlew :plugins:nf-tower:test

@netlify
Copy link
Copy Markdown

netlify Bot commented Mar 19, 2026

Deploy Preview for nextflow-docs-staging canceled.

Name Link
🔨 Latest commit 2a80cbe
🔍 Latest deploy log https://app.netlify.com/projects/nextflow-docs-staging/deploys/69e737e5b300330008da1326

@jorgee jorgee marked this pull request as draft March 19, 2026 14:09
@jorgee
Copy link
Copy Markdown
Contributor Author

jorgee commented Mar 19, 2026

Some comments about current implementation:

  • Some refactoring is needed to decouple the Tower client (API calls) from the Observer. A client initialization must be done at filesystem initialization and another at observer. Due to token refresh, HxClient must be shared to avoid authentication problems when using different clients. I would like to make it after merging Add platform-related metadata to WorkflowRun lineage record #6545
  • The Dataset API does not allow streaming the content, so read and write are done through temporary files.
  • Only csv and tsv extensions are allowed; the format is recognized by the extension.
  • Due to the above comments, I am considering making it read-only
  • Every change in the dataset creates a new version seqera://org/workspace/datasets/name accesses the latest version and seqera://org/workspace/datasets/name@version

Comment thread tests/seqera-dataset.nf
@bentsherman bentsherman added this to the 26.04 milestone Mar 19, 2026
@jorgee jorgee changed the title Add seqera:// NIO filesystem for Seqera Platform datasets Add seqera:// NIO filesystem and refactor TowerClient/TowerObserver split Apr 9, 2026
@jorgee jorgee changed the title Add seqera:// NIO filesystem and refactor TowerClient/TowerObserver split Add Seqera NIO filesystem for datasets and refactor TowerClient/TowerObserver split Apr 9, 2026
@jorgee jorgee marked this pull request as ready for review April 9, 2026 11:58
@jorgee
Copy link
Copy Markdown
Contributor Author

jorgee commented Apr 9, 2026

Updated to the latest changes in master. Ready for review. It is implemented as read-only FS

  • The dataset path: seqera://<org>/<workspace>/datasets/<name>
  • You can use nexflow fs command to browse datasets in Seqera Platform
# List orgs
$ nextflow fs ls seqera://*
seqera-academy
nf-core
seqeralabs
community

#List worspaces
$ nextflow fs ls seqera://nf-core/*
AWSmegatests

# List avaialable items in the workspace (currently just datasets)
$ nextflow fs ls seqera://nf-core/AWSmegatests/*
datasets

# List available datasets
$ nextflow fs ls seqera://nf-core/AWSmegatests/datasets/*
GM_nascent
feb-16-test-6
methylseq_test_full
node-red-tests
proteinfamilies
rnaseq_samplesheet_full
test_rnaseq

# show content of a dataset
$ nextflow fs cat seqera://nf-core/AWSmegatests/datasets/test_rnaseq
sample,fastq_1,fastq_2,strandedness
WT_REP1,s3://nf-core-awsmegatests/rnaseq/input_data/minimal/GSE110004/SRR6357070_1.fastq.gz,s3://nf-core-awsmegatests/rnaseq/input_data/minimal/GSE110004/SRR6357070_2.fastq.gz,reverse
WT_REP1,s3://nf-core-awsmegatests/rnaseq/input_data/minimal/GSE110004/SRR6357071_1.fastq.gz,s3://nf-core-awsmegatests/rnaseq/input_data/minimal/GSE110004/SRR6357071_2.fastq.gz,reverse
WT_REP2,s3://nf-core-awsmegatests/rnaseq/input_data/minimal/GSE110004/SRR6357072_1.fastq.gz,s3://nf-core-awsmegatests/rnaseq/input_data/minimal/GSE110004/SRR6357072_2.fastq.gz,reverse
RAP1_UNINDUCED_REP1,s3://nf-core-awsmegatests/rnaseq/input_data/minimal/GSE110004/SRR6357073_1.fastq.gz,,reverse
RAP1_UNINDUCED_REP2,s3://nf-core-awsmegatests/rnaseq/input_data/minimal/GSE110004/SRR6357074_1.fastq.gz,,reverse
RAP1_UNINDUCED_REP2,s3://nf-core-awsmegatests/rnaseq/input_data/minimal/GSE110004/SRR6357075_1.fastq.gz,,reverse
RAP1_IAA_30M_REP1,s3://nf-core-awsmegatests/rnaseq/input_data/minimal/GSE110004/SRR6357076_1.fastq.gz,s3://nf-core-awsmegatests/rnaseq/input_data/minimal/GSE110004/SRR6357076_2.fastq.gz,reverse

# Download a dataset
$ nextflow fs cp seqera://nf-core/AWSmegatests/datasets/test_rnaseq test_rnaseq.csv
$ cat test_rnaseq.csv 
sample,fastq_1,fastq_2,strandedness
WT_REP1,s3://nf-core-awsmegatests/rnaseq/input_data/minimal/GSE110004/SRR6357070_1.fastq.gz,s3://nf-core-awsmegatests/rnaseq/input_data/minimal/GSE110004/SRR6357070_2.fastq.gz,reverse
WT_REP1,s3://nf-core-awsmegatests/rnaseq/input_data/minimal/GSE110004/SRR6357071_1.fastq.gz,s3://nf-core-awsmegatests/rnaseq/input_data/minimal/GSE110004/SRR6357071_2.fastq.gz,reverse
WT_REP2,s3://nf-core-awsmegatests/rnaseq/input_data/minimal/GSE110004/SRR6357072_1.fastq.gz,s3://nf-core-awsmegatests/rnaseq/input_data/minimal/GSE110004/SRR6357072_2.fastq.gz,reverse
RAP1_UNINDUCED_REP1,s3://nf-core-awsmegatests/rnaseq/input_data/minimal/GSE110004/SRR6357073_1.fastq.gz,,reverse
RAP1_UNINDUCED_REP2,s3://nf-core-awsmegatests/rnaseq/input_data/minimal/GSE110004/SRR6357074_1.fastq.gz,,reverse
RAP1_UNINDUCED_REP2,s3://nf-core-awsmegatests/rnaseq/input_data/minimal/GSE110004/SRR6357075_1.fastq.gz,,reverse
RAP1_IAA_30M_REP1,s3://nf-core-awsmegatests/rnaseq/input_data/minimal/GSE110004/SRR6357076_1.fastq.gz,s3://nf-core-awsmegatests/rnaseq/input_data/minimal/GSE110004/SRR6357076_2.fastq.gz,reverse
  • A data set can be used in a pipeline as the following (only for read access at this moment)
params.dataset = 'seqera://seqeralabs/showcase/datasets/sarek_samples'
process TEST {
        input:
                path(file)
        output:
                stdout
        script:
        """
        cat $file
        """
}

workflow {
        TEST(file(params.dataset)).view()
}

@bentsherman bentsherman requested a review from pditommaso April 9, 2026 13:55
@pditommaso pditommaso requested a review from jordeu April 9, 2026 14:13
@pditommaso
Copy link
Copy Markdown
Member

Pulling @jordeu to be sure this is aligned with Fusion

@bentsherman
Copy link
Copy Markdown
Member

@jorgee can you write a small ADR describing the seqera filesystem hierarchy? that way we can make sure Fusion and Nextflow are aligned more easily

pditommaso

This comment was marked as outdated.

jordeu

This comment was marked as outdated.

@jorgee

This comment was marked as outdated.

@jorgee jorgee requested a review from pditommaso April 10, 2026 09:29
Copy link
Copy Markdown
Member

@pditommaso pditommaso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like a great progress. I left a bunch of minor notes.

I'll post also a few ones reported by Claude worth reviewing or just documenting as the first one (in the following comment)

Comment thread plugins/nf-tower/src/main/io/seqera/tower/plugin/TowerClient.groovy Outdated
Comment thread plugins/nf-tower/src/main/io/seqera/tower/plugin/BaseCommandImpl.groovy Outdated
Comment thread plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraPath.groovy Outdated
Copy link
Copy Markdown
Member

@pditommaso pditommaso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review of the Seqera NIO filesystem and TowerClient/Observer refactor. 8 inline comments — 1 critical, 6 important, 1 suggestion.

Comment thread plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/DatasetInputStream.groovy Outdated
Comment thread plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraPath.groovy Outdated
Comment thread plugins/nf-tower/src/main/io/seqera/tower/plugin/TowerClient.groovy Outdated
Comment thread plugins/nf-tower/src/main/io/seqera/tower/plugin/TowerClient.groovy Outdated
@jorgee jorgee requested a review from pditommaso April 14, 2026 14:26
@jorgee
Copy link
Copy Markdown
Contributor Author

jorgee commented Apr 14, 2026

Addressed all the comments, except one, which implies a refactor on auth and launch commands. I think it is out of the scope of the PR.

Comment thread plugins/nf-tower/build.gradle
Comment thread plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraFileSystem.groovy Outdated
@bentsherman bentsherman requested a review from pditommaso April 20, 2026 13:59
@bentsherman
Copy link
Copy Markdown
Member

@jorgee can you fix the DCO ?

@jorgee
Copy link
Copy Markdown
Contributor Author

jorgee commented Apr 20, 2026

DCO failure is in a @pditommaso commit.

@bentsherman
Copy link
Copy Markdown
Member

@jorgee can you just squash the PR to a single commit and rebase?

@jorgee jorgee force-pushed the 260310-seqera-dataset-fs branch from 93fd702 to 2937bd0 Compare April 21, 2026 08:38
Signed-off-by: jorgee <jorge.ejarque@seqera.io>
@jorgee jorgee force-pushed the 260310-seqera-dataset-fs branch from 2937bd0 to 2a80cbe Compare April 21, 2026 08:40
@jorgee
Copy link
Copy Markdown
Contributor Author

jorgee commented Apr 21, 2026

@jorgee can you just squash the PR to a single commit and rebase?

Done

@bentsherman bentsherman changed the title Add Seqera NIO filesystem for datasets and refactor TowerClient/TowerObserver split Add Seqera NIO filesystem for datasets and refactor TowerClient/TowerObserver Apr 21, 2026
@bentsherman bentsherman merged commit 433b10a into master Apr 21, 2026
24 checks passed
@bentsherman bentsherman deleted the 260310-seqera-dataset-fs branch April 21, 2026 14:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants