Conversation
There was a problem hiding this comment.
Pull request overview
Updates federated data validity tooling and associated field requirement documentation to account for additional VEP 115 annotations and improve validation ergonomics.
Changes:
- Document new
vep115_globalsglobal annotations and avep115row annotation in field requirements (MD + generated HTML). - Refactor/extend validity checks with helpers for test-partition filtering, row↔global length checks, “extra field” warnings, and populating select
infoannotations. - Switch the default public-release input HT to
exomesand adjust the default output path accordingly.
Reviewed changes
Copilot reviewed 2 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| gnomad_qc/v5/data_ingestion/field_requirements.md | Adds VEP 115 global/row field specs to the requirements doc. |
| gnomad_qc/v5/data_ingestion/field_requirements.html | Regenerated HTML to reflect the updated requirements (incl. VEP 115 fields). |
| gnomad_qc/v5/data_ingestion/federated_validity_checks.py | Adds/refactors validation helpers and updates defaults used by the CLI run. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Mike Wilson <mwilson@broadinstitute.org>
Co-authored-by: Mike Wilson <mwilson@broadinstitute.org>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 5 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
mike-w-wilson
left a comment
There was a problem hiding this comment.
A few more comments and also a general one. I know we want to use some of this for v5 but considering the documentation and end goal, this sshould be moved out of v5 and into its own federated directory. A separate script can then call these functions inside v5.
Co-authored-by: Mike Wilson <mwilson@broadinstitute.org>
Co-authored-by: Mike Wilson <mwilson@broadinstitute.org>
Co-authored-by: Mike Wilson <mwilson@broadinstitute.org>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 9 changed files in this pull request and generated no new comments.
Comments suppressed due to low confidence (2)
gnomad_qc/federated/federated_validity_checks.py:386
validate_config_fields_in_htchecks formonoallelic/only_hetinsideht.infowhencheck_mono_and_only_hetis set, butht.infois only populated later viaadd_info_annotations. For inputs where those annotations exist as top-level row fields (and are later moved intoinfo), this will incorrectly flag missing fields (or raise ifinfois empty). Consider either (a) performing this validation afteradd_info_annotations, or (b) accepting the fields in either location (ht.infoorht.row) and validating accordingly.
gnomad_qc/federated/federated_validity_checks.py:693check_missingnessdivides byn_siteswhen computingfrac_missing, butn_sites = ht.count()can be 0 (e.g., if filtering to test partitions/intervals yields an empty Table). This will raise aZeroDivisionError. Add an early return (or skip fraction computation) whenn_sites == 0.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
mike-w-wilson
left a comment
There was a problem hiding this comment.
A couple very minor things and responses, then LGTM
Co-authored-by: Mike Wilson <mwilson@broadinstitute.org>
No description provided.