Skip to content

Fix five pipeline-correctness bugs#19

Open
evasnow1992 wants to merge 5 commits into
NVIDIA-BioNeMo:mainfrom
evasnow1992:evax/pipeline-correctness-fixes
Open

Fix five pipeline-correctness bugs#19
evasnow1992 wants to merge 5 commits into
NVIDIA-BioNeMo:mainfrom
evasnow1992:evax/pipeline-correctness-fixes

Conversation

@evasnow1992

Copy link
Copy Markdown
Collaborator

Summary

Five correctness fixes found during a focused review of the training, finetuning, and inference pipelines. Every fix was verified in-container (torch 2.9.1) before committing. No refactors, no behavior changes beyond the bug being corrected.

Fixes (one per commit)

  1. fix(predict): filter invalid SMILES before featurization task/predict.py built valid_indices as list(range(len(test_data))), i.e. it never filtered — the None-insertion/realignment code below it was dead. Any unparseable or empty SMILES (or a zero-heavy-atom molecule) in a prediction CSV would crash featurization instead of being skipped. Now validity is checked with RDKit (MolFromSmiles, ≥1 heavy atom) exactly as the training path does, so invalid rows are dropped and predictions align to the surviving molecules.

  2. fix(finetune): zero MTLLoss.log_sigma grad each step (--use_mtl_loss) MTLLoss.log_sigma is a standalone nn.Parameter appended directly to the optimizer (param_groups[1]); it is not a submodule of model, so the per-batch model.zero_grad() never cleared its gradient. The gradient accumulated across every batch, corrupting the learned task-uncertainty weights. Added mtl_loss.zero_grad() alongside model.zero_grad(). Verified with the real MTLLoss: log_sigma.grad went 1→2→3 across steps before the fix, constant per-step after.

  3. fix(eval): use BCE on probabilities for classification eval loss In eval the model applies sigmoid to classification outputs, so the eval predictions are already probabilities. The eval-loss path then passed them to BCEWithLogitsLoss, which sigmoids a second time — a double sigmoid that produced a wrong loss (off by up to ~7x on confident predictions). Now the classification branch uses plain BCE on the probabilities; regression is untouched. Verified the corrected loss matches BCEWithLogits on the raw logits to within 1e-7.

  4. fix(eval): keep per-task alignment when a task has no valid targets In evaluate_predictions, a task with zero valid targets hit continue without appending to results, so results came out shorter than num_tasks and every later task's metric silently shifted down one index. Now appends float('nan') before continue, matching the all-0s/all-1s branch directly above it.

  5. fix(finetune): don't double-wrap DataLoader for ensemble_size > 1 The training loop assigned the wrapped loader back to train_data, the same name holding the underlying dataset. The first ensemble model was fine, but the second iteration then did DataLoader(DataLoader(...)), which crashes with "'DataLoader' object is not subscriptable". Use a separate train_loader local and keep train_data as the dataset. Verified with a 2-iteration repro (old crashes on iter 1, fixed runs both).

Behavior impact (what changes)

make_predictions passed skip_invalid_smiles=False and then set valid_indices
to the full range with no validity check, so any empty/unparseable SMILES
reached MolGraph (which raises ValueError) and aborted the entire prediction
run; the None-insertion in write_prediction was consequently dead code.

Compute valid_indices with the same criterion as utils.filter_invalid_smiles
(non-empty, RDKit-parseable, >=1 heavy atom). Invalid rows are dropped before
featurization and the written predictions realign to the original input rows
via the existing None-insertion.

Signed-off-by: Eva Xue <evax@nvidia.com>
MTLLoss.log_sigma is a standalone nn.Parameter registered directly in the
optimizer (param_groups[1]), not a submodule of `model`, so the per-batch
`model.zero_grad()` never cleared its gradient. It accumulated across every
batch, corrupting the learned task-uncertainty weights. Add
`mtl_loss.zero_grad()` next to `model.zero_grad()`.

Only affects finetuning with --use_mtl_loss (off by default). Verified with
the real MTLLoss: log_sigma.grad went 1->2->3 across steps before, constant
per-step after.

Signed-off-by: Eva Xue <evax@nvidia.com>
In eval the model applies sigmoid to classification outputs, so batch_preds
are already probabilities. The eval-loss path then fed them to
BCEWithLogitsLoss, which sigmoids a second time, producing a wrong loss
(off by up to ~7x on confident predictions). Branch on dataset_type and use
plain BCE on the probabilities for classification; regression is unchanged.

Affects only the reported/aggregated classification eval+val loss value, not
predictions or metric-based (AUC) selection. Loss-based checkpoint selection
and HPO for classification now see the correct loss. Verified: corrected loss
matches BCEWithLogits on the raw logits to within 1e-7.

Signed-off-by: Eva Xue <evax@nvidia.com>
In evaluate_predictions, a task with zero valid targets hit `continue`
without appending to `results`, so `results` came out shorter than
num_tasks and every later task's metric shifted down one index. Append
float('nan') before continue, matching the all-0s/all-1s branch just above.

Only affects multi-task datasets where at least one task has no labeled
rows in an eval split; single-task and fully-labeled multi-task runs are
unaffected. Per-task metrics for such runs were mislabeled/misaligned before.

Signed-off-by: Eva Xue <evax@nvidia.com>
The training loop assigned the wrapped loader back to `train_data`, the same
name holding the underlying dataset. On the first ensemble model this was
fine, but the second iteration then did DataLoader(DataLoader(...)), which
crashes with "'DataLoader' object is not subscriptable". Use a separate
`train_loader` local and keep `train_data` as the dataset.

Fixes --ensemble_size > 1 for finetuning, which was previously broken (any
prior multi-model ensemble finetune could not have completed). Single-model
(default ensemble_size=1) runs are unaffected. Verified with a 2-iteration
DataLoader repro: old crashes on iter 1, fixed version runs both.

Signed-off-by: Eva Xue <evax@nvidia.com>
@evasnow1992 evasnow1992 requested a review from sveccham July 2, 2026 02:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant