Skip to content

OPTML-Group/Pion

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Rethinking Muon Beyond Pretraining: Spectral Failures and High-Pass Remedies for VLA and RLVR

preprint project page issues

License: MIT PyTorch

Chongyu Fan   Gaowen Liu   Mingyi Hong   Ramana Rao Kompella   Sijia Liu†,§

Michigan State University    Cisco    University of Minnesota    §IBM Research


This is the official code repository for the paper "Rethinking Muon Beyond Pretraining: Spectral Failures and High-Pass Remedies for VLA and RLVR", which introduces Pion (sPectral hIgh-pass Optimization on momeNtum) -- a drop-in replacement for Muon designed for regimes such as vision-language-action (VLA) training and reinforcement learning with verifiable rewards (RLVR). See the project page for more.

(a) Muon NS (b) Promotion fp (c) Suppression fs (d) High-pass NS

Visualization of f(σ) over σ ∈ [0, 1], with f(σ) = σ shown as the identity reference. (a) ftNS denotes Muon's NS iteration applied t times. (b) ftp denotes the Promotion polynomial fp applied t times. (c) fts denotes the Suppression polynomial fs applied t times. (d) Pion's high-pass NS iteration: fkss ∘ fkpp applies kp Promotion steps followed by ks = 5 - kp Suppression steps.

Abstract

Muon (MomentUm Orthogonalized by Newton–Schulz) is a matrix-aware optimizer that leverages Newton–Schulz (NS) iterations to enforce spectral gradient orthogonalization by driving all singular values of the momentum matrix toward 1. While this uniform spectral whitening enhances exploration and outperforms AdamW in LLM pretraining, we show it could lead to fundamental limitations beyond pretraining in two increasingly important regimes: (i) cross-modality vision-language-action (VLA) training, where inherently low-rank action-module gradients cause amplification of noisy tail directions, and (ii) reinforcement learning with verifiable rewards (RLVR), where low-SNR gradients and the need to preserve per-head specialization inherited from prior training make whitening unstable. To address these challenges, we propose Pion (sPectral hIgh-pass Optimization on momeNtum), a drop-in replacement for Muon that preserves its computational efficiency while replacing uniform spectral whitening with a two-stage Promotion + Suppression mechanism, which we call the high-pass NS iteration. This design induces a sharp spectral high-pass effect, anchoring dominant singular values at 1 while suppressing noisy tail components toward 0, with controllable filter strength. To preserve pretrained per-head heterogeneity, Pion also supports a per-head mode that applies updates independently across attention heads via a simple reshape, at no extra cost. Extensive experiments demonstrate consistent gains over Muon and AdamW across both VLA and RLVR regimes. In VLA training on LIBERO and LIBERO-Plus, Pion consistently outperforms both baselines across ℓ1-regression (VLA-Adapter) and flow-matching (VLANeXt) architectures, e.g., reaching 100% success rate on LIBERO Object at training 1,500 steps with VLA-Adapter, vs. 97.0% for Muon and only 32.2% for AdamW. In RLVR post-training on Qwen3-1.7B/4B with GRPO and GMPO, Pion also outperforms AdamW on MATH and GSM8K while Muon collapses to zero.

What's in this repo

Pion/
├── VLA/                       # Vision-Language-Action experiments
│   ├── VLAAdapter/            # VLA-Adapter
│   │   └── pion_optim/        # Muon / DefaultPion / LowRankMuon
│   ├── VLANeXt/               # VLANeXt
│   │   └── pion_optim/        # Muon / DefaultPion
│   └── openpi/                # π0.5 on real Franka FR3
│       └── src/openpi/training/muon_optim.py
│                              # MuonAdamW / DefaultPionAdamW
└── RL/                        # RLVR experiments
    └── verl/                  # GRPO + GMPO on Qwen3-1.7B / 4B, GSM8K + MATH
        └── verl/utils/muon.py
                               # MuonAdamW / DefaultPionAdamW / PerHeadMuonAdamW / PerHeadPionAdamW

Across all sub-repos we maintain the same five optimizer families, each paired into a base form and an AdamW-fused form. Which form a sub-repo ships depends on whether its training framework can hold multiple torch.optim.Optimizer instances at once:

  • VLA-Adapter / VLANeXt drive several optimizers in the same training loop (one Muon / Pion instance per modality bucket plus a torch.optim.AdamW for the 1-D / embedding / output-head bucket), so they ship the base classes and let the trainer call step() on each.
  • openpi and verl are wrapped by frameworks (openpi's Trainer, verl's Hydra + FSDP2 config) that expose only a single optimizer slot per model; on those we ship the AdamW-fused variants, which apply the Muon / Pion polynomial to ndim ≥ 2 parameters and AdamW to ndim < 2 parameters inside one step() call.

Each sub-repo only ships the variants its recipes actually use (see the tree above).

Algorithm Base class AdamW-fused class
Muon (NS on the whole matrix) Muon MuonAdamW
Muon (NS on per attention head) PerHeadMuonAdamW
Pion (high-pass NS on the whole matrix) DefaultPion DefaultPionAdamW
Pion (high-pass NS on per attention head) PerHeadPionAdamW
LowRankMuon LowRankMuon

Each sub-repo is a pruned, vendored copy of an upstream training codebase with the Pion optimizer wired in and three drop-in run scripts (run_adamw.sh, run_muon.sh, run_pion.sh). See each sub-repo's README.md for full environment setup, data preparation and run commands.

Getting Started

The optimizers are not packaged as a top-level library; they live next to the training code that uses them. Pick your task and follow the sub-repo README:

Sub-repo Backbone / task
VLA/VLAAdapter VLA-Adapter
VLA/VLANeXt VLANeXt
VLA/openpi π0.5 on Franka FR3
RL/verl GRPO / GMPO on Qwen3-1.7B / 4B with GSM8K + MATH

Inside each sub-repo:

  • pion_optim/ (for VLA-Adapter / VLANeXt) or */utils/muon.py / */training/muon_optim.py (for verl / openpi) contains the optimizer implementations.
  • scripts/run_adamw.sh, scripts/run_muon.sh, scripts/run_pion.sh are the three drop-in launchers.

Citation

If you find this work useful, please consider citing:

@article{fan2026rethinking,
  title={Rethinking Muon Beyond Pretraining: Spectral Failures and High-Pass Remedies for VLA and RLVR},
  author={Fan, Chongyu and Liu, Gaowen and Hong, Mingyi and Kompella, Ramana Rao and Liu, Sijia},
  journal={arXiv preprint arXiv:2605.19282},
  year={2026}
}

Acknowledgements

This codebase builds on the excellent Muon optimizer, Flash-Muon, VLA-Adapter, VLANeXt, openpi, and verl.

Contributors

About

Rethinking Muon Beyond Pretraining: Spectral Failures and High-Pass Remedies for VLA and RLVR

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors