usbhid-ups: improve handling of transient LIBUSB_ERROR_IO failures #3259

com6056 · 2026-01-13T03:37:15Z

Some devices (e.g., CyberPower CP1500PFCLCD) have firmware bugs that cause random I/O errors on specific HID reports during normal polling. Rather than triggering expensive reconnection attempts that can fail in daemon mode, skip the failing report and continue with remaining polls.

Add safety check to detect true device disconnection: if all polls fail during an update cycle (items_succeeded == 0), trigger reconnect as before.

This improves stability especially in daemon mode while still detecting real disconnections via other error codes or complete poll failure.

Fixes #3116

More in-depth details from my investigation in case it is helpful:

Problem Description

CyberPower UPS devices (tested with CP1500PFCLCD, ProductID 0x0601) experience "Data stale" errors when running in daemon mode, but work perfectly when debug mode is enabled (-D flag or NUT_DEBUG_LEVEL=1).

Root Cause

The device has firmware bugs that cause transient LIBUSB_ERROR_IO failures on certain HID reports during normal polling operations. Reports like:

0x1a (input.sensitivity)
0x19 (ups.realpower)
Others during "Full update" cycles (every 12th poll with default pollfreq)

These reports exist and work most of the time, but randomly fail with I/O errors.

Current Behavior (Broken)

When LIBUSB_ERROR_IO occurs during polling:

Driver immediately enters reconnect.trying state
Calls reconnect_ups() to close and reopen USB device
In daemon/background mode: USB reconnection fails (due to process being daemonized with setsid(), closed file descriptors, etc.)
Driver sets dstate_datastale() → "Data stale" error to upsd
Loop repeats on next poll → continuous reconnect failures

Why Debug Mode "Worked"

The -D flag sets foreground = 1, preventing the driver from daemonizing:

Process doesn't fork
No setsid() call
Original file descriptors remain open
USB reconnection succeeds in this environment

So debug mode didn't fix the firmware bug—it just made the reconnection workaround actually work. But reconnecting on every transient error is expensive and unnecessary.

Solution

Skip transient errors instead of reconnecting:

When LIBUSB_ERROR_IO occurs, log it and continue to next poll item instead of triggering reconnect
Track successful polls with items_succeeded counter
Safety mechanism: If zero items succeed during an UPDATE cycle, then trigger reconnect (true device failure)

Why This Works

Transient errors on SOME reports = firmware bug → skip and continue with remaining polls
Errors on ALL reports = true disconnection → reconnect
Most polls succeed despite occasional failures → driver stays stable
True disconnections still detected via:
- Other error codes (LIBUSB_ERROR_NO_DEVICE, LIBUSB_ERROR_ACCESS, etc.)
- All polls failing (safety check)
- upsd's MAXAGE timeout

Testing

With this fix and pollonly=true:

Driver runs indefinitely in daemon mode
Occasional LIBUSB_ERROR_IO logged at debug level 3, but skipped
Data collection continues normally
No more reconnect loops or "Data stale" errors

General points

Described the changes in the PR submission or a separate issue, e.g.
known published or discovered protocols, applicable hardware (expected
compatible and actually tested/developed against), limitations, etc.
There may be multiple commits in the PR, aligned and commented with
a functional change. Notably, coding style changes better belong in a
separate PR, but certainly in a dedicated commit to simplify reviews
of "real" changes in the other commits. Similarly for typo fixes in
comments or text documents.
Please star NUT on GitHub, this helps with sponsorships! ;)

Frequent "underwater rocks" for driver addition/update PRs

Revised existing driver families and added a sub-driver if applicable
(nutdrv_qx, usbhid-ups...) or added a brand new driver in the other
case.
Did not extend obsoleted drivers with new hardware support features
(notably blazer and other single-device family drivers for Qx protocols,
except the new nutdrv_qx which should cover them all).
For updated existing device drivers, bumped the DRIVER_VERSION macro
or its equivalent.

For USB devices (HID or not), revised that the driver uses unique
VID/PID combinations, or raised discussions when this is not the case
(several vendors do use same interface chips for unrelated protocols).
For new USB devices, built and committed the changes for the
scripts/upower/95-upower-hid.hwdb file
Proposed NUT data mapping is aligned with existing docs/nut-names.txt
file. If the device exposes useful data points not listed in the file, the
experimental.* namespace can be used as documented there, and discussion
should be raised on the NUT Developers mailing list to standardize the new
concept.
Updated data/driver.list.in if applicable (new tested device info)

Frequent "underwater rocks" for general C code PRs

Did not "blindly assume" default integer type sizes and value ranges,
structure layout and alignment in memory, endianness (layout of bytes and
bits in memory for multi-byte numeric types), or use of generic int where
language or libraries dictate the use of size_t (or ssize_t sometimes).

Progress and errors are handled with upsdebugx(), upslogx(),
fatalx() and related methods, not with direct printf() or exit().
Similarly, NUT helpers are used for error-checked memory allocation and
string operations (except where customized error handling is needed,
such as unlocking device ports, etc.)
Coding style (including whitespace for indentations) follows precedent
in the code of the file, and examples/guide in docs/developers.txt file.
For newly added files, the Makefile.am recipes were updated and the
make distcheck target passes.

General documentation updates

Updated docs/acknowledgements.txt (for vendor-backed device support)
Added or updated manual page information in docs/man/*.txt files
and corresponding recipe lists in docs/man/Makefile.am for new pages
Passed make spellcheck, updated spell-checking dictionary in the
docs/nut.dict file if needed (did not remove any words -- the make
rule printout in case of changes suggests how to maintain it).

Additional work may be needed after posting this PR

Propose a PR for NUT DDL with detailed device data dumps from tests
against real hardware (the more models, the better).
Address NUT CI farm build failures for the PR: testing on numerous
platforms and toolkits can expose issues not seen on just one system.

Revise suggestions from LGTM.COM analysis about "new issues" with
the changed codebase.

Some devices (e.g., CyberPower CP1500PFCLCD) have firmware bugs that cause random I/O errors on specific HID reports during normal polling. Rather than triggering expensive reconnection attempts that can fail in daemon mode, skip the failing report and continue with remaining polls. Add safety check to detect true device disconnection: if all polls fail during an update cycle (items_succeeded == 0), trigger reconnect as before. This improves stability especially in daemon mode while still detecting real disconnections via other error codes or complete poll failure. Fixes networkupstools#3116 Signed-off-by: Jordan Rodgers <com6056@gmail.com>

jimklimov · 2026-01-13T08:24:35Z

Great analysis, thanks. That must have been a fun debugging session...

As for setsid() implications - I've re-checked, this method gets called from common::background() and indeed relatively late in general driver main.c startup (after device init and perhaps data dump, optionally before the update loop - unless deciding to remain foregrounded due to one or another reason).

This is probably a separate issue from this avoidance of reconnections in the first place, but did you have a chance to experiment whether reconnection in this driver actually works/fails? I wonder if the problem is coincidental, e.g. nothing wrong with the driver code, but it is the UPS firmware that gets stuck, reboots and is not responding just at the moment we try to connect back to it... I suppose pulling the USB cable and plugging it back while the driver is running can help in the investigation (unless this is one of those UPSes that power on the USB chip when it is connected, and power it off/cycle when not in use).

com6056 · 2026-01-14T15:15:15Z

#3116 (comment)

So far so good, been running without pollonly = true for 35+ hours now with no disconnects 🎉

Regarding this:

This is probably a separate issue from this avoidance of reconnections in the first place, but did you have a chance to experiment whether reconnection in this driver actually works/fails? I wonder if the problem is coincidental, e.g. nothing wrong with the driver code, but it is the UPS firmware that gets stuck, reboots and is not responding just at the moment we try to connect back to it... I suppose pulling the USB cable and plugging it back while the driver is running can help in the investigation (unless this is one of those UPSes that power on the USB chip when it is connected, and power it off/cycle when not in use).

Out of town right now but I can definitely dig in more when I'm back to see if I can somehow get the debug logging while in daemon mode to answer these questions!

Signed-off-by: Jordan Rodgers <com6056@gmail.com>

jimklimov · 2026-01-14T19:34:43Z

Disregard the netbsd runner fault, ran out of disk.

jimklimov · 2026-01-16T12:42:02Z

Great thanks for your contribution!

com6056 mentioned this pull request Jan 13, 2026

CyberPower CP1500AVRLCD3 doesn't reliably work unless NUT_DEBUG_LEVEL set #3116

Closed

com6056 force-pushed the fix-cyberpower-eio-tolerance branch from cdee828 to badbed7 Compare January 13, 2026 03:41

jimklimov added enhancement CyberPower (CPS) USB Connection stability issues Issues about driver<->device and/or networked connections (upsd<->upsmon...) going AWOL over time labels Jan 13, 2026

jimklimov added this to the 2.8.5 milestone Jan 13, 2026

jimklimov mentioned this pull request Jan 13, 2026

usbhid-ups: fix cyberpower issues resolved when debug logging is enabled #3247

Closed

20 tasks

fix format specifier warning

f5c95d5

Signed-off-by: Jordan Rodgers <com6056@gmail.com>

jimklimov merged commit dcd9e6e into networkupstools:master Jan 16, 2026
27 of 36 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

usbhid-ups: improve handling of transient LIBUSB_ERROR_IO failures #3259

usbhid-ups: improve handling of transient LIBUSB_ERROR_IO failures #3259

Uh oh!

com6056 commented Jan 13, 2026 •

edited

Loading

Uh oh!

jimklimov commented Jan 13, 2026

Uh oh!

com6056 commented Jan 14, 2026 •

edited

Loading

Uh oh!

jimklimov commented Jan 14, 2026

Uh oh!

Uh oh!

jimklimov commented Jan 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

usbhid-ups: improve handling of transient LIBUSB_ERROR_IO failures #3259

usbhid-ups: improve handling of transient LIBUSB_ERROR_IO failures #3259

Uh oh!

Conversation

com6056 commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem Description

Root Cause

Current Behavior (Broken)

Why Debug Mode "Worked"

Solution

Why This Works

Testing

Related

General points

Frequent "underwater rocks" for driver addition/update PRs

Frequent "underwater rocks" for general C code PRs

General documentation updates

Additional work may be needed after posting this PR

Uh oh!

jimklimov commented Jan 13, 2026

Uh oh!

com6056 commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jimklimov commented Jan 14, 2026

Uh oh!

Uh oh!

jimklimov commented Jan 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

com6056 commented Jan 13, 2026 •

edited

Loading

com6056 commented Jan 14, 2026 •

edited

Loading