Skip to content

fix(regen): score short-input oracles on their native window in multi-oracle report#94

Merged
lucapinello merged 1 commit into
mainfrom
fix/2026-06-17-multioracle-native-window
Jun 17, 2026
Merged

fix(regen): score short-input oracles on their native window in multi-oracle report#94
lucapinello merged 1 commit into
mainfrom
fix/2026-06-17-multioracle-native-window

Conversation

@lucapinello

Copy link
Copy Markdown
Contributor

Problem

scripts/regenerate_multioracle.py scored every oracle on a ±(max_output_size/2)1 Mb locus. AlphaGenome's native window is ~1 Mb so it was unaffected, but the fixed-input oracles were silently diluted: the 1 Mb locus tiled LegNet (200 bp) into ~21k windows and ChromBPNet (2,114 bp) into ~500, averaging the single-variant effect toward zero.

The Cross-oracle consensus table therefore read:

  • LegNet MPRA +0.000 (should be +0.30)
  • ChromBPNet chromatin +0.68 (should be +1.37)

— contradicting the conversational/Python path and the article/claims.yaml. (The code comment at the wide-IGV-track block even claimed the table value came "from the canonical narrow window"; the max_output_size default had drifted away from that.)

Fix

_build_variant_report() takes an optional region; run_legnet and run_chrombpnet pass a variant-centred native-window region (LegNet 200 bp, ChromBPNet 2,114 bp). ChromBPNet's wide sliding IGV-display track is still generated separately, so the genome-browser view is unchanged.

Verified (HepG2, rs12740374 / SORT1)

Layer Before After claims.yaml
LegNet MPRA +0.000 +0.299 (ref 0.37 → alt 0.67) A7 +0.30
ChromBPNet DNASE +0.68 +1.374 (ref 288 → alt 748) A1 +1.24/+1.37
AlphaGenome CEBPA / H3K27ac / CAGE / DNASE unchanged +2.77 / +1.26 / +1.52 / +1.33 A3/A5/A6/A2

Regenerated the committed multi-oracle example so it matches. This is the report behind Figure 3 of the chorus-article repro package.

🤖 Generated with Claude Code

…-oracle report

regenerate_multioracle.py scored every oracle on a ±(max_output_size/2) ≈ 1 Mb
locus. AlphaGenome's native window is ~1 Mb so it was unaffected, but the
fixed-input oracles were diluted: the locus tiled LegNet (200 bp) into ~21k
windows and ChromBPNet (2,114 bp) into ~500, averaging the single-variant
effect away. The consensus table read LegNet MPRA +0.000 and ChromBPNet
chromatin +0.68 instead of the conversational/Python-path values.

Fix: _build_variant_report() takes an optional `region`; run_legnet and
run_chrombpnet pass a variant-centred native-window region. Now the table
matches the conversational path and the article/claims: LegNet +0.30
(ref 0.37 -> alt 0.67), ChromBPNet +1.37 (ref 288 -> alt 748). ChromBPNet's
wide IGV-display sliding track is still generated separately, so the browser
view is unchanged. AlphaGenome values unchanged (CEBPA +2.77, H3K27ac +1.26,
CAGE +1.52, DNASE +1.33). Regenerated the committed multi-oracle example.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@lucapinello lucapinello merged commit 336d64a into main Jun 17, 2026
1 check passed
@lucapinello lucapinello deleted the fix/2026-06-17-multioracle-native-window branch June 17, 2026 15:51
lucapinello added a commit that referenced this pull request Jun 17, 2026
…t scored single-window (#95)

MCP list_oracles/getting_started help text now says 7 oracles and lists EPInformer-seq (data path already returned 7). run_legnet in regenerate_multioracle.py mirrors the conversational path (1bp region auto-widens to a single 200bp window): LegNet MPRA +0.347, matching claims.yaml A7. Follow-up to #94.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant