[bugfix]: Add compatibility handling for Qwen3.5 GatedDeltaNet padding-free training and fix create_causal_mask patch when cache_positions removed in transformers >5.3.0 #202
+145
−51
Starting job
background
wait
wait-all
cancel
Loading