chore(notebooks): drop vendored _epinformerseq_v2/, rewire to HF mirror#89
Merged
Merged
Conversation
PR #87 added the EPInformer-seq oracle but README still advertised six. Update hero pitch, oracle picker table, disk-usage breakdown, per-oracle setup block, mirror map, weight-size table, and oracle-list code blocks to include EPInformer-seq (per-cell PerCellProfileNet + frozen BiasNet, 11 Roadmap cells, 1024-bp scalar enhancer activity). Per-oracle footprint added to disk math: ~2 GB env + ~11 MB weights + ~770 KB CDF; total default install moves from ~28 GB to ~31 GB. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Removes the 11 MB of per-cell EPInformer-seq training artifacts that were vendored under examples/notebooks/_epinformerseq_v2/. The weights are now pulled on-demand from the HF mirror lucapinello/chorus-epinformerseq-v2 (PR #87), so the local copy was pure duplication; the local-only summary.json + test_preds.csv training artifacts (~3.5 MB) are reproducible from the cluster training run at /lustre/grp/zyjlab/linjc/epinformer/results/. Changes - examples/notebooks/_epinformerseq_v2/: deleted (44 files, ~11 MB). - examples/notebooks/epinformerseq_v2_percell_performance.ipynb: deleted (pure viewer for the deleted test_preds.csv; per-cell test r values are preserved on the cluster's summary.json). - examples/notebooks/klf1_validated_enhancer_profiles.ipynb: rewired cell s3-epi-slide to load weights via EPInformerSeqOracle (auto- downloads from HF on first use). s3-md updated to drop the vendored-ckpts mention. Re-executed end-to-end (5 oracles). - scripts/build_backgrounds_epinformerseq_v2_percell.py: defaults switched to ~/.chorus/downloads/epinformerseq/{per_cell,bias}. Import PerCellProfileNet/BiasNet from chorus.oracles.epinformerseq_source instead of the vendored model.py. Auto-fetches via the oracle on cache miss. - examples/notebooks/README.md: added topic-focused notebooks table for klf1_validated_enhancer_profiles.ipynb + epinformerseq_testing.ipynb. Drive-by fixes in epinformerseq_testing.ipynb The wild-type + ISM cells still used HALF=128 and asserted len(seq)==256 left over from the v1 256-bp model. The 1024-bp v2 model silently auto-pads, so it didn't crash but produced wrong values (0.62 vs the correct 5.23 in the smoke test). Fixed: - cell 7: HALF=128 -> HALF=512 (1024-bp window). - cell 17: assert len(ref_seq) == 1024. - cell 21: per-position xs now spans the central 256 bp in genome coordinates (REGION_START+384 .. REGION_START+639). Variant-rank computed against that slice. - cells 0, 6, 16, 29: docstring/markdown rewritten to reflect 1024-bp context + central-256-bp scalar aggregation, and the unsigned enhancer_activity LayerConfig (vs the old signed promoter_activity). - Re-executed end-to-end. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
JasonLinjc
added a commit
that referenced
this pull request
Jun 4, 2026
Resolve overlap with main's #88 (EPInformer-seq catalog) and #89 (drop vendored _epinformerseq_v2/). Both landed the same logical changes this branch already carried; took this branch's newer roadmap-retrain versions for the EPInformer-seq descriptions (README), the executed notebooks, and the per_cell_widewin / PerCellProfileNetWide background builder. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Removes the 11 MB of vendored EPInformer-seq per-cell training artifacts under
examples/notebooks/_epinformerseq_v2/. Now that PR #87 lands the weights onlucapinello/chorus-epinformerseq-v2and the oracle auto-fetches them on first use, the local copy was pure duplication. Local-only training artifacts (summary.json + test_preds.csv, ~3.5 MB) are reproducible from the cluster run at `/lustre/grp/zyjlab/linjc/epinformer/results/` if ever needed.Changes
Deletions
Rewired notebooks
Script update
README
Drive-by fixes in `epinformerseq_testing.ipynb`
The wild-type + ISM cells still used `HALF=128` / `assert len(seq)==256` left over from the v1 256-bp model. The 1024-bp v2 model silently auto-pads, so it didn't crash but produced wrong values (smoke test: 0.62 vs the correct 5.23 — different inputs entirely). Fixed:
Test plan
🤖 Generated with Claude Code