Prevent invalid NVS#410
Draft
joker-mik wants to merge 1 commit into
Draft
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
NVS robustness hardening and bootloop prevention
Result of the analysis
The most likely bootloop trigger is not the pure loading of the management page itself, but rather the saving or interrupted saving of settings and the subsequent boot process.
The analysis showed that several values loaded from NVS were trusted too much. If NVS entries are incomplete, malformed, outdated, or internally inconsistent, the device can enter unsafe startup paths or use invalid runtime values.
After comparing the current findings with an older bootloop analysis, four areas were identified as especially relevant:
Preferences.begin()callsThe implemented solution focuses on making NVS handling defensive and self-healing instead of relying on broad destructive recovery such as erasing the whole NVS partition.
Main findings
1. Wi-Fi NVS blobs
In
src/Wlan.cpp, stored Wi-Fi entries were deserialized from binary NVS blobs without sufficient validation.A shortened, malformed, or formally invalid
wifi-*entry could cause reads outside of the valid buffer during boot. This matches well with sporadic bootloops caused by faulty NVS entries.The previous implementation trusted the binary blob structure too much and directly interpreted its content as valid serialized Wi-Fi settings.
2. LED configuration
Several interdependent NVS keys were not handled atomically.
Especially dangerous was the combination of
numControlandcontrolColors.If
numControlwas written successfully butcontrolColorswas missing or had the wrong length, the LED task could later read outside of the color array.In addition, values such as:
numIndicator = 0numIdleDots = 0dimStates = 0ledOffsetcould lead to division-by-zero, modulo/index problems, or invalid LED access.
3. PN5180-LPCD wakeup path
The older bootloop analysis pointed out a concrete issue in the early PN5180-LPCD startup path.
If
pn5180Lpcd=trueis stored in NVS, RFID initialization can run very early during setup.In the PN5180 wakeup check, a 10-byte UID was formatted into a buffer that was sized for the normal 4-byte RFID ID format. This is a strong stack/memory corruption candidate and fits the observed symptom that the device starts again after a full flash erase, because the NVS flag enabling this path is then gone.
4. Unchecked
Preferences.begin()callsSeveral NVS namespaces were opened without checking whether
Preferences.begin()succeeded.If opening a namespace fails, later
getX()calls may silently return default values. This makes it impossible to distinguish between a real default value and a failed NVS access.It also means that critical boot paths may continue as if NVS was available, although it is not.
Implemented professional solution
I deliberately did not choose the approach of simply deleting the entire NVS storage when an error occurs.
While that would be easy, it would be unprofessional because it causes unnecessary data loss and only hides the actual problem.
The more robust solution is:
Changed files
src/Wlan.hChanges
isValid()so malformed or incomplete objects are not treated as usable network configurations.Reasoning
Wi-Fi settings are loaded during startup and therefore must not rely on valid NVS content.
Any serialized data coming from NVS may be truncated, outdated, or malformed. The model now explicitly supports the concept of an invalid deserialized object instead of silently accepting partial data.
src/Wlan.cppChanges
Wi-Fi blob length is checked before reading.
Truncated, oversized, empty, or formally invalid Wi-Fi blobs are no longer used.
Invalid
wifi-*entries are removed during startup.String deserialization now checks:
Static IP configuration is no longer read via an unsafe pointer cast, but via
memcpy.The legacy key
SAVED_WIFISis checked for reasonable length and maximum entry count before migration.Wlan_AddNetworkSettings()now rejects invalid data early and can reuse defective slots.prefsWifiSettings.begin(...)is now checked.If the Wi-Fi NVS namespace cannot be opened, Wi-Fi NVS migration and Wi-Fi entry iteration are skipped instead of continuing with an unavailable namespace.
Wi-Fi settings are not written if the Wi-Fi NVS namespace is unavailable.
Hostname and Wi-Fi enabled state now fall back to safe defaults if the settings namespace is unavailable.
Reasoning
The previous implementation could deserialize Wi-Fi blobs without verifying that the data was complete and structurally valid.
This could lead to out-of-bounds reads during boot if a Wi-Fi entry was incomplete or corrupted.
The new approach treats each Wi-Fi NVS entry as untrusted data. Invalid entries are ignored or removed, while valid entries continue to work as before.
This prevents a single faulty Wi-Fi entry from blocking the whole boot process.
src/Web.cppChanges
POST handling for stored SSIDs now validates:
Wi-Fi configuration now validates the hostname before writing.
LED settings now validate dangerous values before writing:
numIndicator != 0numIdleDots != 0dimStates != 0ledOffset < numIndicatornumIndicator + numControl <= UINT8_MAXcontrolColors.size() == numControlcontrolColorsis now written beforenumControl.Invalid LED payloads are rejected at the Web/API boundary instead of being stored in NVS.
Reasoning
The Web/API layer is the first place where invalid user input can be prevented from entering NVS.
The write order of dependent LED values was also improved. Writing
controlColorsbeforenumControlmakes the configuration safer if saving is interrupted after only part of the LED configuration has been written.This does not make NVS writes fully atomic, but it reduces the risk of storing a state where
numControlpoints to more control LEDs thancontrolColorscan safely provide.src/Led.cppChanges
numIndicator = 0is reset to a safe value.numIdleDots = 0is corrected to avoid modulo or division problems.dimStates = 0is corrected to avoid division by zero.ledOffsetis reset to0.controlColorsentries are removed.Reasoning
LED settings are used by the runtime LED task and therefore must always be internally consistent.
The critical part is that several NVS values depend on each other. For example,
numControlis only safe ifcontrolColorscontains exactly the expected number of entries.Instead of trusting those values blindly, the boot process now normalizes them. If a safe configuration cannot be reconstructed, the unsafe part is disabled.
This avoids out-of-bounds access while preserving as much valid configuration as possible.
src/Rfid.hChanges
Reasoning
RFID UID formatting was implemented in more than one place. This increases the risk that one path handles buffer sizes differently from another.
A central formatter ensures that all RFID readers use the same bounds-safe formatting logic.
src/RfidCommon.cppChanges
Added the central
Rfid_FormatCardId(...)implementation.The formatter checks:
snprintf()return valuesThe function prevents writing past the end of the target buffer.
RFID NVS lookup now only happens if the RFID NVS namespace is available.
RFID-related NVS access is skipped if the namespace failed to open.
Reasoning
This central helper removes duplicated UID formatting logic and provides one safe implementation for all RFID reader backends.
The additional NVS availability check prevents code from using RFID NVS data when the namespace was not successfully opened.
src/RfidMfrc522.cppChanges
Rfid_FormatCardId(...)helper.mfrc522Gainis only read from NVS if the RFID namespace is available.Reasoning
Although the strongest concrete issue was found in the PN5180 path, the MFRC522 path also used its own formatting logic.
Using one shared formatter reduces the risk of future inconsistencies and makes the behavior easier to maintain.
src/RfidPn5180.cppChanges
Replaced local/manual RFID ID formatting with the new central
Rfid_FormatCardId(...)helper.Fixed the critical PN5180-LPCD wakeup path:
pn5180Lpcdis only read from NVS if the RFID namespace is available.The early PN5180-LPCD path is guarded against failed NVS namespace initialization.
Reasoning
This directly addresses the strongest concrete bootloop candidate from the older analysis.
The previous code could enter this path very early during boot if
pn5180Lpcd=truewas stored in NVS. Because this happens before the normal setup is complete, memory corruption at this point can easily appear as a bootloop or failed startup.The new implementation avoids the buffer overflow risk and keeps UID formatting compatible with the existing 4-byte NVS key scheme.
src/RfidConfig.cppChanges
rfidReaderTypeis now only read from NVS if the RFID namespace is available.Reasoning
RFID reader detection should not depend on unchecked NVS availability.
If NVS cannot be opened, the reader configuration should fall back to safe behavior instead of silently relying on default values returned by failed NVS reads.
src/System.hChanges
Added functions to expose whether critical NVS namespaces are available:
System_RfidPrefsAvailable()System_SettingsPrefsAvailable()Reasoning
Other modules need a safe way to know whether NVS is actually available.
This creates a clearer contract between system initialization and feature modules and avoids direct access to unavailable
Preferencesnamespaces.src/System.cppChanges
Preferences.begin()return values are now checked.System_Init_Rfid_Prefs()now records whether therfidTagsnamespace was opened successfully.System_Init()now records whether thesettingsnamespace was opened successfully.Reasoning
Unchecked
Preferences.begin()calls make the boot process harder to diagnose and less robust.If a namespace cannot be opened, the firmware must not continue as if all values were readable and valid.
The new availability tracking allows the firmware to continue booting in a safer degraded mode while avoiding invalid NVS access.
src/main.cppChanges
pn5180Lpcdis only read if the RFID Preferences namespace was opened successfully.Rfid_Init()is no longer called early based on an unchecked NVS value.Reasoning
The early PN5180-LPCD path is especially sensitive because it runs before the normal setup sequence has completed.
Guarding this path prevents an unavailable or invalid NVS state from activating risky early initialization behavior.
html/management.html/management_nvs_test.html(only for testing - don't commit this file)
Changes
Added a test-only NVS robustness test helper section for LED settings.
The helper allows sending intentionally invalid LED payloads to
/settings.Example cases include:
numIndicator = 0numIdleDots = 0dimStates = 0offsetStartnumControlwithout matchingcontrolColorsReasoning
This test page makes it easier to verify that the new server-side validation rejects unsafe values.
It is mainly intended for robustness testing. It should not necessarily be part of a production PR unless maintainers want to keep a built-in diagnostic helper.
Overall assessment
This solution fixes the issue at the correct architectural level: NVS must not be trusted blindly.
Everything that comes from NVS must be treated as potentially damaged, truncated, outdated, incomplete, or inconsistent.
The updated solution covers:
Preferences.begin()callsThe key design decision is to avoid broad destructive recovery such as erasing all NVS data.
Instead, the firmware validates the smallest relevant unit, removes or ignores only invalid entries, and falls back to safe defaults where possible.
This makes the boot process more resilient without unnecessarily deleting user settings.
The code in generated by ChatGPT