Skip to content

Pedigree phasing with whatshap (issue 387)#968

Open
tetedange13 wants to merge 41 commits intogenomic-medicine-sweden:devfrom
tetedange13:387-ped-phasing2
Open

Pedigree phasing with whatshap (issue 387)#968
tetedange13 wants to merge 41 commits intogenomic-medicine-sweden:devfrom
tetedange13:387-ped-phasing2

Conversation

@tetedange13
Copy link
Copy Markdown
Collaborator

@tetedange13 tetedange13 commented Mar 27, 2026

Description

Should adress issue #387

Changed

  • When phasing with whatshap + full family (proband + both parents), now use pedigree information to phase

Main steps

  • In nallo.nf, apply addChildWithTwoParentsToMeta function on "by family pedigree" channel and set pedigree_file to "empty" if family not complete
  • Pass this new pedigree tuple through phasing then whatshap subworkflows
  • In subworkflows/whatshap, use same approach to sync PED with BAM+VCF (otherwise PED files could be swapped between 2 different families)
  • PED goes up to nf-core/whatshap/phase module, that I patched to add a new ch_pedigree input

Warnings / limitations

  • 2 spurious commits = snapshot changes that seem due to other unrelated changes of code : 853517b and a9e1a98 (parts are related to my changes and others not)
  • No test in subworkflows/whatshap to capture this new behaviour ? Would require 2 trios, or best 2 trios + 1 solo ? Is there enough test data on genomic-medicine-sweden/test-datasets for that ?
  • A new test sampleSheet with 2 trios + 1 solo could be a good add too ?
  • nf-core modules lint whatshap/phase not passing anymore, even if I used nf-core modules patch. Complains about meta.yml missing new pedigree input, even if it is well present...

Thanks for taking a look !
Best,
Felix.

@tetedange13 tetedange13 requested a review from a team as a code owner March 27, 2026 01:27
@github-actions
Copy link
Copy Markdown

PR checklist

  • Please describe the purpose of this change, the problem it solves, why this approach was chosen, its impact, and link any relevant issue if it already contains this information.
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool, follow the pipeline conventions in the contribution docs.
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • README.md is updated (including new tools and authors/contributors).

Copy link
Copy Markdown
Collaborator

@fellen31 fellen31 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this contribution!

  • No test in subworkflows/whatshap to capture this new behaviour ? Would require 2 trios, or best 2 trios + 1 solo ? Is there enough test data on genomic-medicine-sweden/test-datasets for that ?

New tests would be great! And I think so! Since whatshap phases one family at a time, we should be able to reuse the input files but with different family/sample IDs. So you could probably use the PacBio HG002, HG003 and HG004 files twice for trios, and then pick one of the files for an extra solo.

  • A new test sampleSheet with 2 trios + 1 solo could be a good add too ?

I think if you can add 2 trios + 1 solo in the subworkflow test, we should be fine with the current tests. The only logic changed in nallo.nf is whether or not we pass the PED-file right?

I was also wondering if we should have a parameter controlling whether or not PED phasing is applied for trios...but I guess there's no downside to it, so maybe that's just unnecessary?

Would be interesting to hear if phasing a trio with whatshap ped is better than e.g. longphase. Have you tried?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This update would be great otherwise have in the nf-core module. If you make a PR to nf-core/modules with these updates you can ping me for a review in the nf-core Slack. Should also take care of issues with the linting.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still applies.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest version of whatshap/phase module changed input structure : https://github.com/nf-core/modules/pull/11041/changes#diff-04d6148de5293b82d05a67c61d9995e47c020e76e0f16f950dac131382e62e62

But I am on it !

Copy link
Copy Markdown
Collaborator Author

@tetedange13 tetedange13 Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR to add pedigree input to nf-core/whatshap/phase : nf-core/modules#11128

EDIT : Will probably take me longer than expected to adress this PR, so in the meantime I updated whatshap/phase (with new inputs structure) and re-patched it with ped input

.map { meta, _files -> [ [ id: meta.family_id ], meta ] }
.groupTuple()
)
// If 'childWithTwoParents==false', set family_ped=empty
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, but could you explain why we need to do this? Will whatshap fail if you give it a PED with only two individuals in a family?

Suggested change
// If 'childWithTwoParents==false', set family_ped=empty
// If childWithTwoParents is false we don't provide the ped file, because ...

Copy link
Copy Markdown
Collaborator Author

@tetedange13 tetedange13 Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on Whatshap documentation, pedigree phasing activates a distinct algorithm

So I enforced it to "full trio", in one hand to not risk worse results on cases like "duos" or even "solos". And in the other hand to reserve pedigree phasing for when it is most relevant (= when we have both parents I guess)

But I did actually made a benchmark of all that, to be honest

Copy link
Copy Markdown
Collaborator

@fellen31 fellen31 Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I would write something like "If we provide a PED file, whatshap phase will automatically activate the pedigree phasing algorithm which is only available for trios..", to make it more clear.

However, now that you link the documentation it seems like it also supports quartets (which is fine - childWithTwoParents will be true for both trios and quartets), but also duos (parent/childs):

A quartet (note how multiple consecutive spaces are fine):

When phasing multiple samples from individuals that are related (such as parent/child or a trio), then it is possible to provide WhatsHap with a .ped file that describes the pedigree. WhatsHap will use the pedigree and the reads to infer a combined, much better phasing.

Based on this line in the WhatsHap code: https://github.com/whatshap/whatshap/blob/f07423a7ba0c7609e7d4d5c73fb5e4e240057830/whatshap/cli/phase.py#L580, it seems to me like it would just ignore the PED file for singletons. So maybe we could always provide the PED file?

If we are want to be able to control the algorithm when we have parent/child, perhaps we could have a pipeline parameter, e.g. whatshap_pedigree_phasing where the user can determine if we provide the PED file or not? (edit: That would default to true)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adressed by commit abc28d0 (and 106a2ca)

Copy link
Copy Markdown
Collaborator Author

@tetedange13 tetedange13 Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit 7930be2 fix an important typo between param_name and val_param_name...

felix.vandermeeren@chu-montpellier.fr added 2 commits March 27, 2026 10:31
Pass PED file in one of them, to test new 'whatshap pedigree phasing

	modifié :         subworkflows/local/whatshap/tests/main.nf.test
	modifié :         subworkflows/local/whatshap/tests/main.nf.test.snap
felix.vandermeeren@chu-montpellier.fr added 2 commits March 27, 2026 16:21
Pass PED file in one of them, to test new 'whatshap pedigree phasing
@tetedange13 tetedange13 changed the title 387 ped phasing2 Pedigree phasing with whatshap (issue 387) Mar 27, 2026
Copy link
Copy Markdown
Collaborator

@fellen31 fellen31 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Comment on lines +91 to +94
input[5] = channel.of([
[ id:'FAM' ], // Empty ch_pedigree
[]
])
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we try inputting the PED-file here as well?

Copy link
Copy Markdown
Collaborator Author

@tetedange13 tetedange13 Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Smart idea testing that !

  • It works, but snapshots do not match anymore
  • However I think this a false positive, simply due to use of simple snapshot() instead of dedicated nft-vcf plugin's variantsMD5() (by manual check VCFs seem identical, except for add of --ped FAM.ped in header)

Plot twist is that looking at logs when inputting PED :

So I guess that PED is well ignored when given on a solo

Copy link
Copy Markdown
Collaborator Author

@tetedange13 tetedange13 Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also tested on a bigger dataset (100 K variants) and whatshap phase still produces identical results on a solo with or without --ped option --> Confirm we should be fine, PED is well ignored with solo
(even I cannot exactly pinpoint this behaviour in whatshap phase's code)

Do we keep it like that or do you prefer to play safe and get back to initial "phasing for complete trios ONLY" option ?

val_phaser,
!val_skip_sv_calling,
cram_output,
ch_ped_family
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I think we could always pass the ped file here (unless we do not want pedigree phasing), even though it will only be used by whatshap.
  • I prefer always passing the meta along with the file, the empty equivalent would be [[],[]]
  • We recently moved all params from this workflow to main.nf, since this will be required in future versions of Nextflow. Could you make a val_whatshap_pedigree_phasing input to nallo.nf?
Suggested change
ch_ped_family
val_whatshap_pedigree_phasing ? SOMALIER_PED_FAMILY.out.ped : [[],[]]

Copy link
Copy Markdown
Collaborator Author

@tetedange13 tetedange13 Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adressed by commits 8bb9d80 and 8a7b339

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit 8a7b339 was abusive : you have to use an intermediate variable with a correct meta, otherwise --whatshap_pedigree_phasing false raises a join mismatch error there : https://github.com/tetedange13/nallo/blob/7930be21c4c7a13a61a5232f1586eae4c7153307/subworkflows/local/whatshap/main.nf#L27

Fixed by commit 0a5a6d2

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still applies.

felix.vandermeeren@chu-montpellier.fr added 6 commits April 2, 2026 00:20
@fellen31
Copy link
Copy Markdown
Collaborator

fellen31 commented Apr 2, 2026

I'll be away for a few weeks but I'll check in again when I get back, unless someone else can continue the review @tetedange13!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants