fix(regen): score short-input oracles on their native window in multi-oracle report#94
Merged
Merged
Conversation
…-oracle report regenerate_multioracle.py scored every oracle on a ±(max_output_size/2) ≈ 1 Mb locus. AlphaGenome's native window is ~1 Mb so it was unaffected, but the fixed-input oracles were diluted: the locus tiled LegNet (200 bp) into ~21k windows and ChromBPNet (2,114 bp) into ~500, averaging the single-variant effect away. The consensus table read LegNet MPRA +0.000 and ChromBPNet chromatin +0.68 instead of the conversational/Python-path values. Fix: _build_variant_report() takes an optional `region`; run_legnet and run_chrombpnet pass a variant-centred native-window region. Now the table matches the conversational path and the article/claims: LegNet +0.30 (ref 0.37 -> alt 0.67), ChromBPNet +1.37 (ref 288 -> alt 748). ChromBPNet's wide IGV-display sliding track is still generated separately, so the browser view is unchanged. AlphaGenome values unchanged (CEBPA +2.77, H3K27ac +1.26, CAGE +1.52, DNASE +1.33). Regenerated the committed multi-oracle example. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
lucapinello
added a commit
that referenced
this pull request
Jun 17, 2026
…t scored single-window (#95) MCP list_oracles/getting_started help text now says 7 oracles and lists EPInformer-seq (data path already returned 7). run_legnet in regenerate_multioracle.py mirrors the conversational path (1bp region auto-widens to a single 200bp window): LegNet MPRA +0.347, matching claims.yaml A7. Follow-up to #94.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
scripts/regenerate_multioracle.pyscored every oracle on a±(max_output_size/2)≈ 1 Mb locus. AlphaGenome's native window is ~1 Mb so it was unaffected, but the fixed-input oracles were silently diluted: the 1 Mb locus tiled LegNet (200 bp) into ~21k windows and ChromBPNet (2,114 bp) into ~500, averaging the single-variant effect toward zero.The Cross-oracle consensus table therefore read:
— contradicting the conversational/Python path and the article/
claims.yaml. (The code comment at the wide-IGV-track block even claimed the table value came "from the canonical narrow window"; themax_output_sizedefault had drifted away from that.)Fix
_build_variant_report()takes an optionalregion;run_legnetandrun_chrombpnetpass a variant-centred native-window region (LegNet 200 bp, ChromBPNet 2,114 bp). ChromBPNet's wide sliding IGV-display track is still generated separately, so the genome-browser view is unchanged.Verified (HepG2, rs12740374 / SORT1)
Regenerated the committed multi-oracle example so it matches. This is the report behind Figure 3 of the chorus-article repro package.
🤖 Generated with Claude Code