Apply per-gene multiple testing correction for site selection metrics#460
Merged
Merged
Conversation
Agent-Logs-Url: https://github.com/bbglab/deepCSA/sessions/78cc6f96-7594-4d38-93df-622555a78ba3 Co-authored-by: FerriolCalvet <38539786+FerriolCalvet@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Add multiple testing correction for site selection analysis
Apply per-gene multiple testing correction for site selection metrics
May 21, 2026
Member
|
tested and works well!
|
FerriolCalvet
approved these changes
Jun 1, 2026
- fill 0s with minimal numeric resolution
Contributor
There was a problem hiding this comment.
Pull request overview
This PR updates the per-site positive selection (“site comparison”) outputs to include per-gene multiple-testing correction (Benjamini–Hochberg) and surfaces the adjusted p-values to downstream consumers (notably saturation plots). It also wires an optional gene subset parameter through the Nextflow module invocation.
Changes:
- Add
p_value_adjtoomega_comparison_per_site.pyby applying Benjamini–Hochberg correction within eachGENEgroup, after flooring underflowedp_value == 0. - Update saturation plotting logic to use adjusted p-values (
p_value_adj/pvalue_adj) for significance decisions and adjust default significance threshold to0.05. - Pass an optional
--genessubset from Nextflow config/module intoomega_comparison_per_site.py, and include compiled global-local omega results in summary inputs when enabled.
Reviewed changes
Copilot reviewed 3 out of 6 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| workflows/deepcsa.nf | Adds compiled global-local omega results into the positive selection result bundle for downstream summary/plotting. |
| modules/local/bbgtools/sitecomparison/main.nf | Adds optional --genes argument plumbing to the site comparison process (currently introduces a command construction bug). |
| conf/tools/omega.config | Wires params.selected_genes into the site comparison module via ext.genes_subset. |
| bin/utils.py | Introduces shared MIN_NONZERO_PVALUE constant for p-value underflow flooring. |
| bin/plot_gene_saturation.py | Switches significance logic to adjusted p-values and aligns defaults/inputs for global-local omega table usage. |
| bin/omega_comparison_per_site.py | Implements per-gene BH correction and emits p_value_adj alongside raw p_value; adds --genes option. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Site selection p-values must be corrected per gene (not across the full panel). The site comparison output should include both raw and adjusted p-values for downstream consumers.
GENEgroup inomega_comparison_per_site.py.p_value_adjalongsidep_valuefor each site/AA grouping.