Auto-recover a hung preamp on persistent I2C write failures#1098
Auto-recover a hung preamp on persistent I2C write failures#1098stamateviorel wants to merge 1 commit into
Conversation
When the preamp microcontroller hangs it stops ACKing and every I2C write fails with OSError 121 (EREMOTEIO). The existing fallback only reopens the SMBus handle, which recovers a transient bus glitch but not a hung preamp - zone control stays dead until someone power-cycles the unit. Escalate: when the reopened-bus retry also fails, reset the preamps in place, re-assign I2C addresses, reopen the bus and re-flush all cached register values so zone state (mute/source/volume) survives, then retry the write. Rate-limited to once per 20s so a benign one-off glitch never resets audio. Observed live on our unit 2026-06-04 (zone control dead until manual reboot); with this patch the same wedge self-heals in under a second. Signed-off-by: Stamate Viorel <stamate.viorel@gmail.com> Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
We probably need to merge this code with recent changes made with a similar intent. |
|
Makes sense — I dug into Your branch detects the failure and surfaces it to the user (the failure counter + alert pointing at Settings -> Config -> Hardware Reset), which is exactly the right UX when the hardware genuinely needs attention. This PR sits one level below that: on the failure it automatically pulses The clean combination: auto-recover first; if the auto-recovery itself keeps failing, that's when your alert should fire — i.e. "we tried to self-heal and couldn't, this one really does need hardware attention." That's a stronger signal than alerting on the first I2C error, which is often transient. Happy to rebase this onto |
What does this change intend to accomplish?
When the preamp microcontroller hangs it stops ACKing and every I2C write fails with
OSError 121(EREMOTEIO). The existing fallback in_Preamps.write_byte_dataonly reopens the SMBus handle, which recovers a transient bus glitch but not a hung preamp — zone control stays dead until someone power-cycles the unit. This escalates the fallback: when the reopened-bus retry also fails, reset the preamps in place, re-assign I2C addresses, reopen the bus, re-flush all cached register values (so zone mute/source/volume state survives the reset), then retry the write. Rate-limited to once per 20 s so a benign one-off glitch never resets audio.We hit this live on 2026-06-04 (all zone control dead, reads OK / writes failing with Errno 121, only a reboot helped). With this patch deployed the same wedge self-heals in under a second.
Checklist
python -m py_compileclean; happy to fix anything CI flags)