Conversation
|
Please join the nf-core organization on GitHub to enable the CI-tests to run on your PR. You can request to join the organization via #github-invitations in the nf-core slack. You can join the nf-core slack via https://nf-co.re/join. |
famosab
left a comment
There was a problem hiding this comment.
Thank you for your contribution to nf-core! We really appreciate it. I added a few comments to your PR.
| } | ||
|
|
||
| then { | ||
| assert workflow.success |
There was a problem hiding this comment.
We also want a snapshot here (look at other subworkflows)
There was a problem hiding this comment.
The test now passes with direct nf-test. The failure with nf-core subworkflows test is due to a temporary missing Wave container for the plink2/vcf module (manifest unknown). The logic and snapshot are correct.
| missing | ||
|
|
||
| main: | ||
| versions = Channel.empty() |
There was a problem hiding this comment.
Check for each module if they still export the versions I think at least bcftools/filter does not anymore
| FLASHPCA2 ( PLINK2_RECODE_VCF.out.vcf ) | ||
| versions = versions.mix(FLASHPCA2.out.versions.first()) | ||
|
|
||
| // TODO: qui aggiungeremo KMeans/DBSCAN/plot quando creeremo i moduli local |
There was a problem hiding this comment.
Is there still something to add?
There was a problem hiding this comment.
Thank you for your comment @famosab .
You’re absolutely right — the clustering components (KMeans, DBSCAN), internal validation metrics (Silhouette, Calinski–Harabasz, Davies–Bouldin), non-linear embeddings (t-SNE/UMAP), and the final HTML report still need to be integrated.
These features are already implemented in the original pipeline (https://github.com/dbaku42/nf-core-snpclustering). I intentionally left them out of this PR to keep the subworkflow minimal and easier to review.
I’m happy to proceed in either of the following ways:
- Include all these components directly in this PR (my preferred option), or
- Add them in a dedicated follow-up PR immediately after this one is merged.
Please let me know which approach you’d prefer.
Thanks again!
There was a problem hiding this comment.
I would say finalize it in one PR :) and then we can check if everything is done properly.
If you need extra modules that are not part of nf-core yet then please add them in a separate PR.
There was a problem hiding this comment.
Also can you do this please:
Please join the nf-core organization on GitHub to enable the CI-tests to run on your PR. You can request to join the organization via #github-invitations in the nf-core slack. You can join the nf-core slack via https://nf-co.re/join. :)
Co-authored-by: Famke Bäuerle <45968370+famosab@users.noreply.github.com>
- Updated nf-test snapshots - Small changes to main.nf and meta.yml - Updated bcftools/filter and plink2/vcf related files
famosab
left a comment
There was a problem hiding this comment.
I added a few comments :) The main thing is that we do not allow local modules. And that I would encourage to supply each module on its own (with a singular PR, that makes reviewing easier) until they are merged you can set this PR to draft.
There was a problem hiding this comment.
We cannot have a "local" module to be added to an nf-core subworkflow :) The module needs to be added following the nf-core guidelines and then it can be used in the subworkflow!
There was a problem hiding this comment.
See above comment
You already reuse plink/vcf, it might be worth building (or reusing if it exists) the plink2/pgen2bed module. I would ask you to do that in a separate PR to keep the review load small :)
| --make-pgen \\ | ||
| --set-all-var-ids '@:#:\$r:\$a' \\ | ||
| --new-id-max-allele-len 10 missing \\ | ||
| --rm-dup force-first \\ |
There was a problem hiding this comment.
We should not hardcode flags in nf-core modules. I think we need to find a way to make the flags dependent on the input.
There was a problem hiding this comment.
You supplied a lot of scripts. Is there a way to pack them into a small python package and use that package to build the modules?
| { assert workflow.out.plink_bed.size() > 0 }, | ||
| { assert workflow.out.pca.size() > 0 }, | ||
| { assert workflow.out.cluster_labels.size() > 0 }, | ||
| { assert workflow.out.metrics.size() > 0 }, | ||
| { assert workflow.out.plots.size() > 0 } |
There was a problem hiding this comment.
We want to have a snapshot :)
| include { PLINK2_PGEN2BED } from '../../../modules/local/plink2_pgen2bed' | ||
| include { PCA_FLASHPCA } from '../../../modules/local/pca' | ||
| include { CLUSTERING } from '../../../modules/local/clustering' | ||
| include { CLUSTER_METRICS } from '../../../modules/local/cluster_metrics' | ||
| include { CLUSTER_VIZ } from '../../../modules/local/cluster_viz' |
There was a problem hiding this comment.
All of these need to be added to nf-core sepearately :)
| vcf_ch | ||
|
|
||
| main: | ||
| def versions_ch = Channel.empty() |
There was a problem hiding this comment.
We encourage to use topics instead. That makes this obsolete. More information about that can be found in the docs.
Description
This PR adds the
snpclusteringsubworkflow for end-to-end unsupervised clustering of genomic samples directly from multi-sample VCF files.Features
bcftools/filterplink2/indeppairwiseplink2/recodevcfflashpca2The subworkflow was developed in relation to the accepted nf-core proposal for the
consepopgenpipeline.Related to:
Checklist
nf-core subworkflows lint snpclusteringpassednf-core subworkflows test snpclusteringpassedCloses # (no specific issue)