Add snpclustering subworkflow by dbaku42 · Pull Request #11059 · nf-core/modules

dbaku42 · 2026-03-26T14:39:58Z

Description

This PR adds the snpclustering subworkflow for end-to-end unsupervised clustering of genomic samples directly from multi-sample VCF files.

Features

Variant filtering (MAF + missingness) with bcftools/filter
LD pruning with plink2/indeppairwise
Export pruned VCF with plink2/recodevcf
PCA with flashpca2

The subworkflow was developed in relation to the accepted nf-core proposal for the consepopgen pipeline.

Related to:

New pipeline: nf-core/consepopgen proposals#57 (nf-core/consepopgen)

Checklist

nf-core subworkflows lint snpclustering passed
nf-core subworkflows test snpclustering passed
Follows nf-core subworkflow conventions

Closes # (no specific issue)

famosab · 2026-04-02T12:54:41Z

Please join the nf-core organization on GitHub to enable the CI-tests to run on your PR. You can request to join the organization via #github-invitations in the nf-core slack. You can join the nf-core slack via https://nf-co.re/join.

famosab

Thank you for your contribution to nf-core! We really appreciate it. I added a few comments to your PR.

famosab · 2026-04-02T12:55:07Z

subworkflows/nf-core/snpclustering/tests/main.nf.test

+        }
+
+        then {
+            assert workflow.success


We also want a snapshot here (look at other subworkflows)

The test now passes with direct nf-test. The failure with nf-core subworkflows test is due to a temporary missing Wave container for the plink2/vcf module (manifest unknown). The logic and snapshot are correct.

subworkflows/nf-core/snpclustering/tests/tags.yml

famosab · 2026-04-02T12:55:57Z

subworkflows/nf-core/snpclustering/main.nf

+    missing
+
+    main:
+    versions = Channel.empty()


Check for each module if they still export the versions I think at least bcftools/filter does not anymore

subworkflows/nf-core/snpclustering/meta.yml

famosab · 2026-04-02T12:56:54Z

subworkflows/nf-core/snpclustering/main.nf

+    FLASHPCA2 ( PLINK2_RECODE_VCF.out.vcf )
+    versions = versions.mix(FLASHPCA2.out.versions.first())
+
+    // TODO: qui aggiungeremo KMeans/DBSCAN/plot quando creeremo i moduli local


Is there still something to add?

Thank you for your comment @famosab .

You’re absolutely right — the clustering components (KMeans, DBSCAN), internal validation metrics (Silhouette, Calinski–Harabasz, Davies–Bouldin), non-linear embeddings (t-SNE/UMAP), and the final HTML report still need to be integrated.

These features are already implemented in the original pipeline (https://github.com/dbaku42/nf-core-snpclustering). I intentionally left them out of this PR to keep the subworkflow minimal and easier to review.

I’m happy to proceed in either of the following ways:

Include all these components directly in this PR (my preferred option), or

Add them in a dedicated follow-up PR immediately after this one is merged.

Please let me know which approach you’d prefer.

Thanks again!

I would say finalize it in one PR :) and then we can check if everything is done properly.

If you need extra modules that are not part of nf-core yet then please add them in a separate PR.

Also can you do this please:

Please join the nf-core organization on GitHub to enable the CI-tests to run on your PR. You can request to join the organization via #github-invitations in the nf-core slack. You can join the nf-core slack via https://nf-co.re/join. :)

Co-authored-by: Famke Bäuerle <45968370+famosab@users.noreply.github.com>

- Updated nf-test snapshots - Small changes to main.nf and meta.yml - Updated bcftools/filter and plink2/vcf related files

famosab

I added a few comments :) The main thing is that we do not allow local modules. And that I would encourage to supply each module on its own (with a singular PR, that makes reviewing easier) until they are merged you can set this PR to draft.

famosab · 2026-04-14T07:41:20Z

modules/local/cluster_metrics.nf

We cannot have a "local" module to be added to an nf-core subworkflow :) The module needs to be added following the nf-core guidelines and then it can be used in the subworkflow!

famosab · 2026-04-14T07:41:33Z

modules/local/cluster_viz.nf

See above comment

famosab · 2026-04-14T07:41:37Z

modules/local/clustering.nf

See above comment

famosab · 2026-04-14T07:41:42Z

modules/local/pca.nf

See above comment

famosab · 2026-04-14T07:42:42Z

modules/local/plink2_pgen2bed.nf

See above comment

You already reuse plink/vcf, it might be worth building (or reusing if it exists) the plink2/pgen2bed module. I would ask you to do that in a separate PR to keep the review load small :)

famosab · 2026-04-14T07:43:25Z

modules/nf-core/plink2/vcf/main.nf

+        --make-pgen \\
+        --set-all-var-ids '@:#:\$r:\$a' \\
+        --new-id-max-allele-len 10 missing \\
+        --rm-dup force-first \\


We should not hardcode flags in nf-core modules. I think we need to find a way to make the flags dependent on the input.

famosab · 2026-04-14T07:44:16Z

subworkflows/nf-core/snpclustering/scripts/cluster_metrics.py

You supplied a lot of scripts. Is there a way to pack them into a small python package and use that package to build the modules?

famosab · 2026-04-14T07:44:42Z

subworkflows/nf-core/snpclustering/tests/main.nf.test

+                { assert workflow.out.plink_bed.size() > 0 },
+                { assert workflow.out.pca.size() > 0 },
+                { assert workflow.out.cluster_labels.size() > 0 },
+                { assert workflow.out.metrics.size() > 0 },
+                { assert workflow.out.plots.size() > 0 }


We want to have a snapshot :)

famosab · 2026-04-14T07:45:02Z

subworkflows/nf-core/snpclustering/main.nf

+include { PLINK2_PGEN2BED }    from '../../../modules/local/plink2_pgen2bed'
+include { PCA_FLASHPCA }       from '../../../modules/local/pca'
+include { CLUSTERING }         from '../../../modules/local/clustering'
+include { CLUSTER_METRICS }    from '../../../modules/local/cluster_metrics'
+include { CLUSTER_VIZ }        from '../../../modules/local/cluster_viz'


All of these need to be added to nf-core sepearately :)

famosab · 2026-04-14T07:45:41Z

subworkflows/nf-core/snpclustering/main.nf

+    vcf_ch
+
+    main:
+    def versions_ch = Channel.empty()


We encourage to use topics instead. That makes this obsolete. More information about that can be found in the docs.

donald and others added 2 commits March 26, 2026 15:35

Add snpclustering subworkflow

6af3ab8

Merge branch 'master' into add/snpclustering

dea109e

famosab reviewed Apr 2, 2026

View reviewed changes

dbaku42 and others added 9 commits April 3, 2026 15:46

Update subworkflows/nf-core/snpclustering/meta.yml

8aac674

Co-authored-by: Famke Bäuerle <45968370+famosab@users.noreply.github.com>

Update snpclustering subworkflow

7f27d21

- Updated nf-test snapshots - Small changes to main.nf and meta.yml - Updated bcftools/filter and plink2/vcf related files

Update description and input/output parameters in meta.yml

dabc976

Update tags in main.nf.test for SNP clustering

81b57d1

Remove obsolete tags.yml and fix snpclustering test tags

b796acd

Fix snpclustering metadata and test tags

09473bc

Fix snpclustering metadata and test tags

b35034a

Add local modules and scripts for snpclustering

d6fdd58

Merge branch 'master' into add/snpclustering

c56e54c

famosab requested changes Apr 14, 2026

View reviewed changes

Conversation

dbaku42 commented Mar 26, 2026

Description

Features

Checklist

Uh oh!

famosab commented Apr 2, 2026

Uh oh!

famosab left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

famosab left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants