Skip to content

Prevent invalid NVS#410

Draft
joker-mik wants to merge 1 commit into
biologist79:devfrom
joker-mik:Feature/prevent_invalid_NVS
Draft

Prevent invalid NVS#410
joker-mik wants to merge 1 commit into
biologist79:devfrom
joker-mik:Feature/prevent_invalid_NVS

Conversation

@joker-mik

@joker-mik joker-mik commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

NVS robustness hardening and bootloop prevention

Result of the analysis

The most likely bootloop trigger is not the pure loading of the management page itself, but rather the saving or interrupted saving of settings and the subsequent boot process.

The analysis showed that several values loaded from NVS were trusted too much. If NVS entries are incomplete, malformed, outdated, or internally inconsistent, the device can enter unsafe startup paths or use invalid runtime values.

After comparing the current findings with an older bootloop analysis, four areas were identified as especially relevant:

  1. Wi-Fi NVS blobs
  2. LED configuration
  3. PN5180-LPCD wakeup path
  4. Unchecked Preferences.begin() calls

The implemented solution focuses on making NVS handling defensive and self-healing instead of relying on broad destructive recovery such as erasing the whole NVS partition.


Main findings

1. Wi-Fi NVS blobs

In src/Wlan.cpp, stored Wi-Fi entries were deserialized from binary NVS blobs without sufficient validation.

A shortened, malformed, or formally invalid wifi-* entry could cause reads outside of the valid buffer during boot. This matches well with sporadic bootloops caused by faulty NVS entries.

The previous implementation trusted the binary blob structure too much and directly interpreted its content as valid serialized Wi-Fi settings.

2. LED configuration

Several interdependent NVS keys were not handled atomically.

Especially dangerous was the combination of numControl and controlColors.

If numControl was written successfully but controlColors was missing or had the wrong length, the LED task could later read outside of the color array.

In addition, values such as:

  • numIndicator = 0
  • numIdleDots = 0
  • dimStates = 0
  • invalid ledOffset

could lead to division-by-zero, modulo/index problems, or invalid LED access.

3. PN5180-LPCD wakeup path

The older bootloop analysis pointed out a concrete issue in the early PN5180-LPCD startup path.

If pn5180Lpcd=true is stored in NVS, RFID initialization can run very early during setup.

In the PN5180 wakeup check, a 10-byte UID was formatted into a buffer that was sized for the normal 4-byte RFID ID format. This is a strong stack/memory corruption candidate and fits the observed symptom that the device starts again after a full flash erase, because the NVS flag enabling this path is then gone.

4. Unchecked Preferences.begin() calls

Several NVS namespaces were opened without checking whether Preferences.begin() succeeded.

If opening a namespace fails, later getX() calls may silently return default values. This makes it impossible to distinguish between a real default value and a failed NVS access.

It also means that critical boot paths may continue as if NVS was available, although it is not.


Implemented professional solution

I deliberately did not choose the approach of simply deleting the entire NVS storage when an error occurs.

While that would be easy, it would be unprofessional because it causes unnecessary data loss and only hides the actual problem.

The more robust solution is:

  • validate before writing
  • never deserialize unchecked data during reading
  • treat NVS contents as untrusted input
  • ignore or remove faulty entries during boot in a self-healing way
  • write dependent NVS values in a safer order
  • avoid early boot crashes by guarding NVS-dependent startup paths
  • recover from inconsistent configurations instead of crashing
  • only reset the smallest possible invalid part instead of erasing all settings

Changed files

src/Wlan.h

Changes

  • Added maximum lengths for SSID and password.
  • Added stricter validation rules for Wi-Fi settings.
  • Explicitly represented deserialization errors in the object state.
  • Tightened isValid() so malformed or incomplete objects are not treated as usable network configurations.
  • Changed the deserialization interfaces so that they can perform bounds checks.
  • Prepared the Wi-Fi settings model so that invalid NVS blobs can be rejected safely instead of being partially interpreted.

Reasoning

Wi-Fi settings are loaded during startup and therefore must not rely on valid NVS content.

Any serialized data coming from NVS may be truncated, outdated, or malformed. The model now explicitly supports the concept of an invalid deserialized object instead of silently accepting partial data.


src/Wlan.cpp

Changes

  • Wi-Fi blob length is checked before reading.

  • Truncated, oversized, empty, or formally invalid Wi-Fi blobs are no longer used.

  • Invalid wifi-* entries are removed during startup.

  • String deserialization now checks:

    • available buffer length
    • maximum allowed string length
    • required null terminator
    • embedded null bytes
    • complete buffer consumption
  • Static IP configuration is no longer read via an unsafe pointer cast, but via memcpy.

  • The legacy key SAVED_WIFIS is checked for reasonable length and maximum entry count before migration.

  • Wlan_AddNetworkSettings() now rejects invalid data early and can reuse defective slots.

  • prefsWifiSettings.begin(...) is now checked.

  • If the Wi-Fi NVS namespace cannot be opened, Wi-Fi NVS migration and Wi-Fi entry iteration are skipped instead of continuing with an unavailable namespace.

  • Wi-Fi settings are not written if the Wi-Fi NVS namespace is unavailable.

  • Hostname and Wi-Fi enabled state now fall back to safe defaults if the settings namespace is unavailable.

Reasoning

The previous implementation could deserialize Wi-Fi blobs without verifying that the data was complete and structurally valid.

This could lead to out-of-bounds reads during boot if a Wi-Fi entry was incomplete or corrupted.

The new approach treats each Wi-Fi NVS entry as untrusted data. Invalid entries are ignored or removed, while valid entries continue to work as before.

This prevents a single faulty Wi-Fi entry from blocking the whole boot process.


src/Web.cpp

Changes

  • POST handling for stored SSIDs now validates:

    • JSON object exists
    • SSID exists
    • SSID is a string
    • SSID length is valid
    • password length is valid
    • static IP fields are valid
  • Wi-Fi configuration now validates the hostname before writing.

  • LED settings now validate dangerous values before writing:

    • numIndicator != 0
    • numIdleDots != 0
    • dimStates != 0
    • ledOffset < numIndicator
    • numIndicator + numControl <= UINT8_MAX
    • controlColors.size() == numControl
  • controlColors is now written before numControl.

  • Invalid LED payloads are rejected at the Web/API boundary instead of being stored in NVS.

Reasoning

The Web/API layer is the first place where invalid user input can be prevented from entering NVS.

The write order of dependent LED values was also improved. Writing controlColors before numControl makes the configuration safer if saving is interrupted after only part of the LED configuration has been written.

This does not make NVS writes fully atomic, but it reduces the risk of storing a state where numControl points to more control LEDs than controlColors can safely provide.


src/Led.cpp

Changes

  • Invalid LED NVS values are corrected during boot in a self-healing way.
  • numIndicator = 0 is reset to a safe value.
  • numIdleDots = 0 is corrected to avoid modulo or division problems.
  • dimStates = 0 is corrected to avoid division by zero.
  • An invalid ledOffset is reset to 0.
  • Overly large combinations of indicator and control LEDs are limited.
  • Invalid or incomplete controlColors entries are removed.
  • If the control LED configuration is inconsistent, control LEDs are disabled instead of reading outside the color array.
  • LED defaults are now also initialized safely when the settings namespace is unavailable.
  • The volatile fallback path is normalized as well, so the LED task still receives safe runtime values even without readable NVS settings.
  • Invalid NVS data is prevented from propagating into the LED runtime task.

Reasoning

LED settings are used by the runtime LED task and therefore must always be internally consistent.

The critical part is that several NVS values depend on each other. For example, numControl is only safe if controlColors contains exactly the expected number of entries.

Instead of trusting those values blindly, the boot process now normalizes them. If a safe configuration cannot be reconstructed, the unsafe part is disabled.

This avoids out-of-bounds access while preserving as much valid configuration as possible.


src/Rfid.h

Changes

  • Added required standard includes for size and integer types.
  • Declared a new central RFID ID formatting helper:
bool Rfid_FormatCardId(const uint8_t *cardId, size_t cardIdLen, char *target, size_t targetLen);

Reasoning

RFID UID formatting was implemented in more than one place. This increases the risk that one path handles buffer sizes differently from another.

A central formatter ensures that all RFID readers use the same bounds-safe formatting logic.


src/RfidCommon.cpp

Changes

  • Added the central Rfid_FormatCardId(...) implementation.

  • The formatter checks:

    • input pointer validity
    • output pointer validity
    • output buffer size
    • remaining buffer capacity during formatting
    • snprintf() return values
  • The function prevents writing past the end of the target buffer.

  • RFID NVS lookup now only happens if the RFID NVS namespace is available.

  • RFID-related NVS access is skipped if the namespace failed to open.

Reasoning

This central helper removes duplicated UID formatting logic and provides one safe implementation for all RFID reader backends.

The additional NVS availability check prevents code from using RFID NVS data when the namespace was not successfully opened.


src/RfidMfrc522.cpp

Changes

  • Replaced local/manual RFID ID formatting with the new central Rfid_FormatCardId(...) helper.
  • The MFRC522 path now benefits from the same buffer-size checks as the PN5180 path.
  • mfrc522Gain is only read from NVS if the RFID namespace is available.
  • If RFID NVS is unavailable, the code falls back to safe/default behavior instead of relying on a failed NVS access.

Reasoning

Although the strongest concrete issue was found in the PN5180 path, the MFRC522 path also used its own formatting logic.

Using one shared formatter reduces the risk of future inconsistencies and makes the behavior easier to maintain.


src/RfidPn5180.cpp

Changes

  • Replaced local/manual RFID ID formatting with the new central Rfid_FormatCardId(...) helper.

  • Fixed the critical PN5180-LPCD wakeup path:

    • the previous implementation formatted up to 10 UID bytes into a buffer sized for the normal 4-byte RFID ID
    • this could overflow the stack buffer
    • the wakeup check now uses the central bounds-safe formatter
    • the formatted ID remains consistent with the existing 4-byte NVS key scheme
  • pn5180Lpcd is only read from NVS if the RFID namespace is available.

  • The early PN5180-LPCD path is guarded against failed NVS namespace initialization.

Reasoning

This directly addresses the strongest concrete bootloop candidate from the older analysis.

The previous code could enter this path very early during boot if pn5180Lpcd=true was stored in NVS. Because this happens before the normal setup is complete, memory corruption at this point can easily appear as a bootloop or failed startup.

The new implementation avoids the buffer overflow risk and keeps UID formatting compatible with the existing 4-byte NVS key scheme.


src/RfidConfig.cpp

Changes

  • rfidReaderType is now only read from NVS if the RFID namespace is available.
  • Auto-detected RFID reader type is only written back to NVS if the RFID namespace is available.
  • RFID configuration now degrades safely if NVS cannot be used.

Reasoning

RFID reader detection should not depend on unchecked NVS availability.

If NVS cannot be opened, the reader configuration should fall back to safe behavior instead of silently relying on default values returned by failed NVS reads.


src/System.h

Changes

  • Added functions to expose whether critical NVS namespaces are available:

    • System_RfidPrefsAvailable()
    • System_SettingsPrefsAvailable()

Reasoning

Other modules need a safe way to know whether NVS is actually available.

This creates a clearer contract between system initialization and feature modules and avoids direct access to unavailable Preferences namespaces.


src/System.cpp

Changes

  • Added a central helper for opening Preferences namespaces safely.
  • Preferences.begin() return values are now checked.
  • Failures while opening NVS namespaces are logged.
  • The availability state of critical namespaces is tracked.
  • System_Init_Rfid_Prefs() now records whether the rfidTags namespace was opened successfully.
  • System_Init() now records whether the settings namespace was opened successfully.
  • Settings-dependent operations now use safe defaults if the settings namespace is unavailable.
  • Operation mode, powerdown recovery, and other NVS-dependent startup behavior are guarded against unavailable NVS.

Reasoning

Unchecked Preferences.begin() calls make the boot process harder to diagnose and less robust.

If a namespace cannot be opened, the firmware must not continue as if all values were readable and valid.

The new availability tracking allows the firmware to continue booting in a safer degraded mode while avoiding invalid NVS access.


src/main.cpp

Changes

  • The early PN5180-LPCD startup path is now guarded by the RFID namespace availability check.
  • pn5180Lpcd is only read if the RFID Preferences namespace was opened successfully.
  • Rfid_Init() is no longer called early based on an unchecked NVS value.
  • Boot count and last-RFID recovery logic only access settings NVS if the settings namespace is available.

Reasoning

The early PN5180-LPCD path is especially sensitive because it runs before the normal setup sequence has completed.

Guarding this path prevents an unavailable or invalid NVS state from activating risky early initialization behavior.


html/management.html / management_nvs_test.html

(only for testing - don't commit this file)

Changes

  • Added a test-only NVS robustness test helper section for LED settings.

  • The helper allows sending intentionally invalid LED payloads to /settings.

  • Example cases include:

    • numIndicator = 0
    • numIdleDots = 0
    • dimStates = 0
    • invalid offsetStart
    • numControl without matching controlColors
    • too many LEDs

Reasoning

This test page makes it easier to verify that the new server-side validation rejects unsafe values.

It is mainly intended for robustness testing. It should not necessarily be part of a production PR unless maintainers want to keep a built-in diagnostic helper.


Overall assessment

This solution fixes the issue at the correct architectural level: NVS must not be trusted blindly.

Everything that comes from NVS must be treated as potentially damaged, truncated, outdated, incomplete, or inconsistent.

The updated solution covers:

  • Wi-Fi blob validation
  • LED configuration validation and recovery
  • PN5180-LPCD buffer overflow prevention
  • checked Preferences.begin() calls
  • safer handling of unavailable NVS namespaces
  • guarded early boot behavior
  • centralized RFID ID formatting
  • test support for intentionally invalid LED settings

The key design decision is to avoid broad destructive recovery such as erasing all NVS data.

Instead, the firmware validates the smallest relevant unit, removes or ignores only invalid entries, and falls back to safe defaults where possible.

This makes the boot process more resilient without unnecessarily deleting user settings.

The code in generated by ChatGPT

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant