Skip to content

Guard realtime accounting against corrupt NaN/Inf values (#1216)#1218

Open
gskjold wants to merge 1 commit into
mainfrom
fix/realtime-nan-crc-guard
Open

Guard realtime accounting against corrupt NaN/Inf values (#1216)#1218
gskjold wants to merge 1 commit into
mainfrom
fix/realtime-nan-crc-guard

Conversation

@gskjold

@gskjold gskjold commented Jun 25, 2026

Copy link
Copy Markdown
Member

Problem

Issue #1216: after a firmware upgrade, data.json returned invalid JSON like "i": -nan and absurd cost values (e.g. 9.5e25) under ea.h, breaking the dashboard.

Root cause

EnergyAccountingRealtimeData lives in non-initialized RAM on ESP32 (__NOINIT_ATTR in AmsToMqttBridge.cpp) so realtime accounting survives a reboot. It was validated only by a single magic byte (0x6A). That is not enough to distinguish valid data from garbage left by a previous firmware: a struct-layout change across an upgrade can shift fields while byte 0 still happens to equal 0x6A, so stale bytes get reinterpreted as floats (yielding NaN / huge values). Those floats were then served verbatim via %.2f, and the toolchain prints NaN as -nan — invalid JSON.

This matches the report: a C3 board reflashed across versions during development, while another device upgraded cleanly.

Fix

Two complementary layers:

  1. CRC guard — add a crc16 over the struct contents (reusing src/decoder/.../crc.h). The struct is reinitialized when either the magic byte or the CRC mismatches, so stale RAM from an incompatible build is discarded. The CRC is refreshed at the end of every update().
  2. Sanitize — the realtime getters now clamp NaN/Inf to 0, so corrupt floats can never leak into JSON/MQTT output. Mirrors the existing std::isnan guard in getUseLastMonth().

Verification

  • pio run -e esp32c3dev — SUCCESS (reported device)
  • pio run -e esp8266dev — SUCCESS (non-NOINIT path)

🤖 Generated with Claude Code

The EnergyAccountingRealtimeData struct lives in non-initialized RAM on
ESP32 (__NOINIT_ATTR) so it survives a reboot. It was validated only by a
single magic byte, which is not enough to distinguish valid data from
garbage left by a previous firmware: a struct-layout change across an
upgrade can shift fields while byte 0 still happens to equal 0x6A. The
stale float fields were then served verbatim, producing invalid JSON in
data.json such as "i": -nan and absurd cost values (issue #1216), which
breaks the dashboard.

Two layers of defense:
- Add a CRC16 over the struct contents; reinitialize when either the magic
  byte or the CRC mismatches. The CRC is refreshed after every update().
- Sanitize the realtime getters so NaN/Inf can never leak into JSON/MQTT
  output, mirroring the existing isnan guard in getUseLastMonth().

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

🔧 PR Build Artifacts

Version: 502d6ca

All environments built successfully. Download the zip files:

Artifacts expire after 7 days. View workflow run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant