Skip to content

Kl/validty updates#760

Merged
klaricch merged 41 commits intomainfrom
kl/validty_updates
Apr 24, 2026
Merged

Kl/validty updates#760
klaricch merged 41 commits intomainfrom
kl/validty_updates

Conversation

@klaricch
Copy link
Copy Markdown
Contributor

@klaricch klaricch commented Mar 5, 2026

No description provided.

@klaricch klaricch requested a review from Copilot March 5, 2026 14:46
@klaricch klaricch requested a review from a team as a code owner March 5, 2026 14:46
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates federated data validity tooling and associated field requirement documentation to account for additional VEP 115 annotations and improve validation ergonomics.

Changes:

  • Document new vep115_globals global annotations and a vep115 row annotation in field requirements (MD + generated HTML).
  • Refactor/extend validity checks with helpers for test-partition filtering, row↔global length checks, “extra field” warnings, and populating select info annotations.
  • Switch the default public-release input HT to exomes and adjust the default output path accordingly.

Reviewed changes

Copilot reviewed 2 out of 3 changed files in this pull request and generated 4 comments.

File Description
gnomad_qc/v5/data_ingestion/field_requirements.md Adds VEP 115 global/row field specs to the requirements doc.
gnomad_qc/v5/data_ingestion/field_requirements.html Regenerated HTML to reflect the updated requirements (incl. VEP 115 fields).
gnomad_qc/v5/data_ingestion/federated_validity_checks.py Adds/refactors validation helpers and updates defaults used by the CLI run.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread gnomad_qc/v5/data_ingestion/federated_validity_checks.py Outdated
Comment thread gnomad_qc/v5/data_ingestion/federated_validity_checks.py Outdated
Comment thread gnomad_qc/v5/data_ingestion/federated_validity_checks.py Outdated
Comment thread gnomad_qc/v5/data_ingestion/federated_validity_checks.py Outdated
@klaricch klaricch requested a review from mike-w-wilson March 5, 2026 17:24
Comment thread gnomad_qc/v5/data_ingestion/federated_validity_checks.py Outdated
Comment thread gnomad_qc/v5/data_ingestion/federated_validity_checks.py Outdated
Comment thread gnomad_qc/v5/data_ingestion/federated_validity_checks.py Outdated
Comment thread gnomad_qc/v5/data_ingestion/federated_validity_checks.py Outdated
Comment thread gnomad_qc/v5/data_ingestion/federated_validity_checks.py Outdated
Comment thread gnomad_qc/v5/data_ingestion/federated_validity_checks.py Outdated
Comment thread gnomad_qc/v5/data_ingestion/federated_validity_checks.py Outdated
Comment thread gnomad_qc/v5/data_ingestion/federated_validity_checks.py Outdated
Comment thread gnomad_qc/v5/data_ingestion/federated_validity_checks.py Outdated
Comment thread gnomad_qc/v5/data_ingestion/field_requirements.md Outdated
@klaricch klaricch requested a review from mike-w-wilson March 16, 2026 18:32
Comment thread gnomad_qc/v5/data_ingestion/federated_validity_checks.py Outdated
Comment thread gnomad_qc/v5/data_ingestion/federated_validity_checks.py
Comment thread gnomad_qc/v5/configs/validity_inputs_config.json Outdated
Comment thread gnomad_qc/v5/configs/validity_inputs_schema.py
Comment thread gnomad_qc/v5/data_ingestion/federated_validity_checks.py Outdated
Comment thread gnomad_qc/v5/data_ingestion/federated_validity_checks.py Outdated
Comment thread gnomad_qc/v5/data_ingestion/federated_validity_checks.py Outdated
Comment thread gnomad_qc/v5/data_ingestion/federated_validity_checks.py Outdated
Comment thread gnomad_qc/v5/data_ingestion/federated_validity_checks.py Outdated
Comment thread gnomad_qc/v5/data_ingestion/federated_validity_checks.py
klaricch and others added 3 commits March 17, 2026 09:29
Co-authored-by: Mike Wilson <mwilson@broadinstitute.org>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 5 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread gnomad_qc/v5/data_ingestion/federated_validity_checks.py Outdated
Comment thread gnomad_qc/v5/data_ingestion/federated_validity_checks.py
Comment thread gnomad_qc/v5/data_ingestion/federated_validity_checks.py Outdated
Comment thread gnomad_qc/v5/data_ingestion/federated_validity_checks.py
Comment thread gnomad_qc/v5/data_ingestion/federated_validity_checks.py
Comment thread gnomad_qc/v5/data_ingestion/federated_validity_checks.py
@klaricch klaricch requested a review from mike-w-wilson April 3, 2026 18:47
Copy link
Copy Markdown
Contributor

@mike-w-wilson mike-w-wilson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more comments and also a general one. I know we want to use some of this for v5 but considering the documentation and end goal, this sshould be moved out of v5 and into its own federated directory. A separate script can then call these functions inside v5.

Comment thread gnomad_qc/v5/data_ingestion/federated_validity_checks.py
Comment thread gnomad_qc/v5/data_ingestion/field_requirements.html
Comment thread gnomad_qc/v5/data_ingestion/field_requirements.md
Comment thread gnomad_qc/v5/configs/validity_inputs_config.json Outdated
Comment thread gnomad_qc/v5/data_ingestion/federated_validity_checks.py Outdated
Comment thread gnomad_qc/v5/data_ingestion/federated_validity_checks.py Outdated
Comment thread gnomad_qc/v5/data_ingestion/federated_validity_checks.py Outdated
Comment thread gnomad_qc/v5/data_ingestion/federated_validity_checks.py Outdated
Comment thread gnomad_qc/v5/data_ingestion/federated_validity_checks.py Outdated
Comment thread gnomad_qc/v5/data_ingestion/federated_validity_checks.py
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 9 changed files in this pull request and generated no new comments.

Comments suppressed due to low confidence (2)

gnomad_qc/federated/federated_validity_checks.py:386

  • validate_config_fields_in_ht checks for monoallelic/only_het inside ht.info when check_mono_and_only_het is set, but ht.info is only populated later via add_info_annotations. For inputs where those annotations exist as top-level row fields (and are later moved into info), this will incorrectly flag missing fields (or raise if info is empty). Consider either (a) performing this validation after add_info_annotations, or (b) accepting the fields in either location (ht.info or ht.row) and validating accordingly.
    gnomad_qc/federated/federated_validity_checks.py:693
  • check_missingness divides by n_sites when computing frac_missing, but n_sites = ht.count() can be 0 (e.g., if filtering to test partitions/intervals yields an empty Table). This will raise a ZeroDivisionError. Add an early return (or skip fraction computation) when n_sites == 0.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@klaricch klaricch requested a review from mike-w-wilson April 21, 2026 20:34
Copy link
Copy Markdown
Contributor

@mike-w-wilson mike-w-wilson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple very minor things and responses, then LGTM

Comment thread gnomad_qc/federated/federated_validity_checks.py Outdated
Comment thread gnomad_qc/v5/data_ingestion/federated_validity_checks.py Outdated
klaricch and others added 2 commits April 23, 2026 11:28
Co-authored-by: Mike Wilson <mwilson@broadinstitute.org>
@klaricch klaricch requested a review from mike-w-wilson April 23, 2026 15:43
@klaricch klaricch merged commit b43e33a into main Apr 24, 2026
5 checks passed
@klaricch klaricch deleted the kl/validty_updates branch April 24, 2026 13:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants