diff --git a/.claude/commands/README.md b/.claude/commands/README.md new file mode 100644 index 00000000000..3767dac987a --- /dev/null +++ b/.claude/commands/README.md @@ -0,0 +1,49 @@ +# Claude Code slash commands for the mcp-server test suite + +Three AI-assisted workflows wrapping `mcp-server/run-tests.sh` and the meshtastic MCP tools. Each one has a twin in `.github/prompts/` for Copilot users. + +| Slash command | What it does | Copilot equivalent | +| --------------------- | ------------------------------------------------------------------------- | ---------------------------------------- | +| `/test [args]` | Runs the test suite (auto-detects hardware) and interprets failures | `.github/prompts/mcp-test.prompt.md` | +| `/diagnose [role]` | Read-only device health report via the meshtastic MCP tools | `.github/prompts/mcp-diagnose.prompt.md` | +| `/repro [n=5]` | Re-runs one test N times, diffs firmware logs between passes and failures | `.github/prompts/mcp-repro.prompt.md` | + +## Why two surfaces + +The Claude Code commands and Copilot prompts cover the same three workflows but each speaks its host's idiom: + +- **Claude Code** (`/test`) uses `$ARGUMENTS` for pass-through, has direct access to Bash + all MCP tools registered in the user's settings, and runs in the terminal context. +- **Copilot** (`/mcp-test`) runs in VS Code's agent mode; it has terminal + MCP access too but typically asks the operator to confirm inputs interactively. + +A contributor using either IDE gets equivalent assistance. Keep the two in sync when behavior changes — the diff of intent should be minimal. + +## House rules + +- **No destructive writes without explicit operator approval.** Skills that could reflash, factory-reset, or reboot a device must describe the action and stop — the operator authorizes. +- **Interpret failures, don't just echo them.** The skill body should pull firmware log lines from `mcp-server/tests/report.html` (the `Meshtastic debug` section, attached by `tests/conftest.py::pytest_runtest_makereport`) and classify the failure. +- **Keep MCP tool calls sequential per port.** SerialInterface holds an exclusive port lock; two parallel tool calls on the same port deadlock. +- **Never speculate about root cause.** If the evidence doesn't support a classification, say "unknown" and list what you'd need to disambiguate. + +## Adding a new command + +1. Write the Claude Code version at `.claude/commands/.md` with YAML frontmatter: + + ```yaml + --- + description: one-line purpose (used for auto-invocation by the model) + argument-hint: [optional-hint] + --- + ``` + +2. Write the Copilot equivalent at `.github/prompts/mcp-.prompt.md` with: + + ```yaml + --- + mode: agent + description: ... + --- + ``` + +3. Add the row to the table above. Cross-link in both bodies. + +4. Smoke-test on Claude Code first (`/` should appear in autocomplete), then in VS Code Copilot (`/mcp-` in Chat). diff --git a/.claude/commands/diagnose.md b/.claude/commands/diagnose.md new file mode 100644 index 00000000000..45aa937a5b7 --- /dev/null +++ b/.claude/commands/diagnose.md @@ -0,0 +1,55 @@ +--- +description: Produce a device health report using the meshtastic MCP tools (device_info, list_nodes, get_config, short serial log capture) +argument-hint: [role=all|nrf52|esp32s3|] +--- + +# `/diagnose` — device health report + +Call the meshtastic MCP tool bundle and format a structured health report for one or all detected devices. Zero guesswork for the operator. + +## What to do + +1. **Enumerate hardware.** Call `mcp__meshtastic__list_devices(include_unknown=True)`. For each entry where `likely_meshtastic=True`, capture `port`, `vid`, `pid`, `description`. + +2. **Filter by `$ARGUMENTS`**: + - No args, `all` → every likely-meshtastic device. + - `nrf52` → only devices with `vid == 0x239a`. + - `esp32s3` → only devices with `vid == 0x303a` or `vid == 0x10c4`. + - A `/dev/cu.*` path → only that one port. + - Anything else → treat as a substring match against the `port` string. + +3. **For each selected device, in sequence (NOT parallel — SerialInterface holds an exclusive port lock):** + - `mcp__meshtastic__device_info(port=

)` — captures `my_node_num`, `long_name`, `short_name`, `firmware_version`, `hw_model`, `region`, `num_nodes`, `primary_channel`. + - `mcp__meshtastic__list_nodes(port=

)` — count of peers, which ones have `publicKey` set, SNR/RSSI distribution. + - `mcp__meshtastic__get_config(section="lora", port=

)` — region, preset, channel_num, tx_power, hop_limit. + - Optionally, if the device seems unhappy (fails to connect, `num_nodes==1` when ≥2 are plugged in, missing firmware*version), open a short firmware log window: `mcp__meshtastic__serial_open(port=

, env=)`, wait 3s, `serial_read(session_id=, max_lines=100)`, `serial_close(session_id=)`. The env should be inferred from the VID map in `mcp-server/run-tests.sh` (nrf52 → rak4631, esp32s3 → heltec-v3) unless `MESHTASTIC_MCP_ENV*` is set. + +4. **Render per-device report** as: + + ```text + [nrf52 @ /dev/cu.usbmodem1101] fw=2.7.23.bce2825, hw=RAK4631 + owner : Meshtastic 40eb / 40eb + region/band : US, channel 88, LONG_FAST + tx_power : 30 dBm, hop_limit=3 + peers : 1 (esp32s3 0x433c2428, pubkey ✓, SNR 6.0 / RSSI -24 dBm) + primary ch : McpTest + firmware : no panics in last 3s; NodeInfoModule emitted 2 broadcasts + ``` + + Keep it scannable. If a field is missing or abnormal (no pubkey for a known peer, region=UNSET, num_nodes inconsistent with the hub), flag it inline with a short `⚠︎ `. + +5. **Cross-device correlation** (only when >1 device is inspected): + - Do both sides see each other in `nodesByNum`? If one does and the other doesn't, that's asymmetric NodeInfo — flag it. + - Do the LoRa configs match? (region, channel_num, modem_preset should all agree; mismatch = no mesh) + - Do the primary channel NAMES match? Mismatch = different PSK = no decode. + +6. **Suggest next actions only for specific, recognisable failure modes**: + - Stale PKI pubkey one-way → "run `/test tests/mesh/test_direct_with_ack.py` — the retry + nodeinfo-ping heals this in the test path." + - Region mismatch → "re-bake one side via `./mcp-server/run-tests.sh --force-bake`." + - Device unreachable → point at touch_1200bps + the CP2102-wedged-driver note in run-tests.sh. + +## What NOT to do + +- No writes. No `set_config`, no `reboot`, no `factory_reset`. This is a read-only diagnostic skill — if the operator wants to change state, they'll ask explicitly. +- No `flash` / `erase_and_flash`. Those are separate escalations. +- No holding SerialInterface across tool calls — open, query, close; next device. The port lock is exclusive. diff --git a/.claude/commands/repro.md b/.claude/commands/repro.md new file mode 100644 index 00000000000..52dcf222b93 --- /dev/null +++ b/.claude/commands/repro.md @@ -0,0 +1,65 @@ +--- +description: Re-run a specific test N times in isolation to triage flakes, diff firmware logs between passes and failures +argument-hint: [count=5] +--- + +# `/repro` — flakiness triage for one test + +Re-run a single pytest node ID N times in isolation, track pass rate, and surface what's _different_ in the firmware logs between the passing attempts and the failing ones. Turns "it's flaky, I guess" into "it fails when X, passes when Y." + +## What to do + +1. **Parse `$ARGUMENTS`**: first token is the pytest node id (e.g. `tests/mesh/test_direct_with_ack.py::test_direct_with_ack_roundtrip[nrf52->esp32s3]`); second token is an integer count (default `5`, cap at `20`). If the first token doesn't look like a test path (no `::` and no `tests/` prefix), treat the whole `$ARGUMENTS` as a `-k` filter instead. + +2. **Sanity-check the hub first** (so we're not measuring "nothing plugged in" N times): call `mcp__meshtastic__list_devices`. If the test name contains `nrf52` or `esp32s3` and the matching VID isn't present, stop and report — re-running won't help. + +3. **Loop N times**. For each iteration: + + ```bash + ./mcp-server/run-tests.sh --tb=short -p no:cacheprovider + ``` + + Capture: exit code, duration, and (on failure) the `Meshtastic debug` firmware log section from `mcp-server/tests/report.html`. `-p no:cacheprovider` suppresses pytest's `.pytest_cache` writes so iterations don't influence each other. + +4. **Track a small structured tally**: + + ```text + attempt 1: PASS (42s) + attempt 2: FAIL (128s) ← firmware log 200-line tail captured + attempt 3: PASS (39s) + attempt 4: FAIL (121s) + attempt 5: PASS (41s) + -------------------------------------- + pass rate: 3/5 (60%) | mean duration: 74s + ``` + +5. **On mixed outcomes**: diff the firmware log tails between a representative passing attempt and a representative failing attempt. Focus on: + - Error-level lines only present in failures (`PKI_UNKNOWN_PUBKEY`, `Alloc an err=`, `Skip send`, `No suitable channel`) + - Timing around the assertion event — did a broadcast go out, was there an ACK, did NAK fire? + - Device state fields that changed (nodesByNum entries, region/preset, channel_num) + + Surface the top 3 differences as a "passes when / fails when" table. Don't dump full logs — pull specific lines with uptime timestamps. + +6. **Classify the flake** into one of: + - **LoRa airtime collision** → pass rate improves with fewer concurrent transmitters; propose a `time.sleep` gap or retry bump in the test body. + - **PKI key staleness** → fails on first attempt, passes after self-heal; existing retry loop in `test_direct_with_ack.py` handles this. + - **NodeInfo cooldown** → `Skip send NodeInfo since we sent it <600s ago` in fail-only logs; needs `broadcast_nodeinfo_ping()` warmup. + - **Hardware-specific** (one direction fails, other passes; one device's firmware is older; driver wedged) → specific recovery pointer. + - **Genuinely unknown** → say so; don't invent a root cause. + +7. **Report back** with: + - Pass rate and mean duration. + - Classification + evidence (the specific log lines that support it). + - A suggested next step (re-run with specific args, open `/diagnose`, edit a specific test file, nothing). + +## Examples + +- `/repro tests/mesh/test_direct_with_ack.py::test_direct_with_ack_roundtrip[esp32s3->nrf52] 10` — runs 10 times, diffs firmware logs. +- `/repro broadcast_delivers` — no `::`, no `tests/`, so interpreted as `-k broadcast_delivers`; runs every matching test the default 5 times. +- `/repro tests/telemetry/test_device_telemetry_broadcast.py 3` — shorter run for a slow test. + +## Constraints + +- Don't exceed `count=20` per invocation — airtime and USB wear add up. If the user asks for 50, negotiate down. +- Don't rebuild firmware as part of triage; flakes that only reproduce under different firmware belong in a separate session. +- If the FIRST attempt fails AND the rest all pass, that's a classic "state leak from a prior test" → say so and suggest running with `--force-bake` or starting from a clean state rather than chasing the first failure. diff --git a/.claude/commands/test.md b/.claude/commands/test.md new file mode 100644 index 00000000000..986ee1f31f6 --- /dev/null +++ b/.claude/commands/test.md @@ -0,0 +1,42 @@ +--- +description: Run the mcp-server test suite (auto-detects devices) and interpret the results +argument-hint: [pytest-args] +--- + +# `/test` — mcp-server test runner with interpretation + +Run `mcp-server/run-tests.sh` and make sense of the output so the operator doesn't have to. + +## What to do + +1. **Invoke the wrapper.** From the firmware repo root, run: + + ```bash + ./mcp-server/run-tests.sh $ARGUMENTS + ``` + + The wrapper auto-detects connected Meshtastic devices, maps each to its PlatformIO env, exports the required `MESHTASTIC_MCP_ENV_*` env vars, and invokes pytest. If the user passed no arguments, the wrapper supplies a sensible default set (`tests/ --html=tests/report.html --self-contained-html --junitxml=tests/junit.xml -v --tb=short`). A `--report-log=tests/reportlog.jsonl` arg is always appended (unless the operator passed their own). `--assume-baked` is deliberately NOT in the defaults — `test_00_bake.py` has its own skip-if-already-baked check and runs the ~8 s verification by default. Operators can opt into the fast path with `--assume-baked`, or force a reflash with `--force-bake`. + +2. **Read the pre-flight header.** First ~6 lines print the detected hub (role → port → env). If that line reads `detected hub : (none)`, the wrapper will narrow to `tests/unit` only — say so explicitly in your summary so the operator knows hardware tiers were skipped. + +3. **On pass**: one-line summary of the form `N passed, M skipped in `. Don't enumerate the 52 test names — the user can read those. Do mention if any test was SKIPPED for a NON-placeholder reason (e.g. "role not present on hub" is worth flagging). + +4. **On failure**: for every FAILED test, open `mcp-server/tests/report.html` and extract the `Meshtastic debug` section for that test. pytest-html embeds the firmware log stream + device state dump there; the 200-line firmware log tail is usually enough to explain the failure. Summarise: which test, one-line assertion message, the firmware log lines that matter (things like `PKI_UNKNOWN_PUBKEY`, `Skip send NodeInfo`, `Error=`, `Guru Meditation`, `assertion failed`). + +5. **Classify the failure** as one of: + - **Transient/flake**: LoRa collision, timing-sensitive assertion, first-attempt NAK + successful retry pattern. Propose `/repro ` to confirm. + - **Environmental**: device unreachable, port busy, CP2102 driver wedged. Suggest the specific recovery (replug USB, `touch_1200bps`, check `git status userPrefs.jsonc`). + - **Regression**: same assertion fails repeatedly, firmware log shows a new/unusual error. Surface the diff between expected and observed, identify the module likely responsible. + +6. **Never run destructive recovery automatically.** If a failure looks like it needs a reflash, factory*reset, or USB replug, \_describe what to do* — don't execute. The operator decides. + +## Arguments handling + +- No args → wrapper's defaults (full suite). +- `$ARGUMENTS` passed verbatim to the wrapper, which passes them to pytest. +- Common operator invocations: `/test tests/mesh`, `/test tests/mesh/test_direct_with_ack.py::test_direct_with_ack_roundtrip`, `/test --force-bake`, `/test -k telemetry`. + +## Side-effects to mention in summary + +- The session fixture snapshots `userPrefs.jsonc` at session start and restores at teardown (plus on `atexit`). After a clean run, `git status userPrefs.jsonc` should be empty. If the wrapper's pre-flight printed a warning about a stale sidecar, call that out — means a prior session crashed. +- `mcp-server/tests/report.html` and `junit.xml` are regenerated on every run; the HTML is self-contained (shareable). diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md index 24e11bd4ddb..d12244229e6 100644 --- a/.github/copilot-instructions.md +++ b/.github/copilot-instructions.md @@ -429,6 +429,8 @@ Most workflows can be triggered manually via `workflow_dispatch` for testing. ## Testing +### Native unit tests (C++) + Unit tests in `test/` directory with 12 test suites: - `test_crypto/` - Cryptography @@ -446,6 +448,164 @@ Run with: `pio test -e native` Simulation testing: `bin/test-simulator.sh` +### Hardware-in-the-loop tests (`mcp-server/tests/`) + +Separate pytest suite that exercises real USB-connected Meshtastic devices. See the **MCP Server & Hardware Test Harness** section below for invocation, tier layout, and agent usage rules. + +## MCP Server & Hardware Test Harness + +The `mcp-server/` directory houses a firmware-aware [MCP](https://modelcontextprotocol.io/) server plus a pytest-based integration suite. AI agents that speak MCP get a well-defined tool surface for flashing, configuring, and inspecting physical Meshtastic devices — use it instead of hand-rolling `pio` or `meshtastic --port` calls where possible. `mcp-server/README.md` is the operator-facing setup doc; this section is the agent-facing usage contract. + +The repo registers the server via `.mcp.json` at the repo root — Claude Code picks it up automatically once `mcp-server/.venv/` is built (`cd mcp-server && python3 -m venv .venv && .venv/bin/pip install -e '.[test]'`). + +### When to use which surface + +| Goal | Tool | +| ------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------- | +| Find a connected device | `mcp__meshtastic__list_devices` | +| Read a live node's config/state | `mcp__meshtastic__device_info`, `list_nodes`, `get_config` | +| Mutate a device (owner, region, channels, reboot) | `set_owner`, `set_config`, `set_channel_url`, `reboot`, `shutdown`, `factory_reset` — all require `confirm=True` | +| Flash firmware to a variant | `pio_flash` (any arch) or `erase_and_flash` (ESP32 factory install) | +| Stream serial logs while debugging | `serial_open` → `serial_read` loop → `serial_close` | +| Administer `userPrefs.jsonc` build-time constants | `userprefs_get`, `userprefs_set`, `userprefs_reset`, `userprefs_manifest` | +| Run the regression suite | `./mcp-server/run-tests.sh` (or `/test` slash command) | +| Diagnose a specific device | `/diagnose [role]` slash command (read-only) | +| Triage a flaky test | `/repro [count]` slash command | + +**One MCP call per port at a time.** `SerialInterface` holds an exclusive OS-level lock on the serial port for its lifetime. If a `serial_*` session is open on `/dev/cu.usbmodem101`, calling `device_info` on the same port will fail fast pointing at the active session. Sequence calls: open → read/mutate → close, then next device. Never parallelize tool calls on the same port. + +### MCP tool surface (~32 tools) + +Grouped by purpose. Full argument shapes in `mcp-server/README.md`; a few high-value signatures are called out here. + +- **Discovery & metadata**: `list_devices`, `list_boards`, `get_board` +- **Build & flash**: `build`, `clean`, `pio_flash`, `erase_and_flash` (ESP32 only), `update_flash` (ESP32 OTA), `touch_1200bps` +- **Serial sessions** (long-running, 10k-line ring buffer): `serial_open`, `serial_read`, `serial_list`, `serial_close` +- **Device reads**: `device_info`, `list_nodes` +- **Device writes** (all require `confirm=True`): `set_owner`, `get_config`, `set_config`, `get_channel_url`, `set_channel_url`, `send_text`, `reboot`, `shutdown`, `factory_reset`, `set_debug_log_api` +- **userPrefs admin** (build-time constants, not runtime config): `userprefs_get`, `userprefs_set`, `userprefs_reset`, `userprefs_manifest`, `userprefs_testing_profile` +- **Vendor escape hatches**: `esptool_chip_info`, `esptool_erase_flash`, `esptool_raw`, `nrfutil_dfu`, `nrfutil_raw`, `picotool_info`, `picotool_load`, `picotool_raw` + +`confirm=True` is a tool-level gate on top of whatever permission prompt your MCP host shows. **Don't bypass it** by asking the host to auto-approve — it exists specifically because MCP hosts sometimes remember "always allow this tool" and that's dangerous for `factory_reset` and `erase_and_flash`. + +### Hardware test suite (`mcp-server/run-tests.sh`) + +The wrapper auto-detects connected devices (VID → role map: `0x239A` → `nrf52`, `0x303A`/`0x10C4` → `esp32s3`), maps each role to a PlatformIO env (`nrf52` → `rak4631`, `esp32s3` → `heltec-v3`, overridable via `MESHTASTIC_MCP_ENV_`), then invokes pytest. Zero pre-flight config needed from the operator. + +Suite tiers (collected + run in this order via `pytest_collection_modifyitems`): + +1. `tests/unit/` — pure Python (boards parse, pio wrapper, userPrefs parse, testing profile). No hardware. +2. `tests/test_00_bake.py` — flashes each detected device with current `userPrefs.jsonc` merged with the session's test profile. Has its own skip-if-already-baked check comparing region + primary channel to the session profile; skips cheaply on warm devices. +3. `tests/mesh/` — multi-device mesh: bidirectional send, broadcast delivery, direct-with-ACK, mesh formation within 60s. Parametrized `[nrf52->esp32s3]` and `[esp32s3->nrf52]`. +4. `tests/telemetry/` — `DEVICE_METRICS_APP` broadcast timing. +5. `tests/monitor/` — boot-log panic check. +6. `tests/fleet/` — PSK seed session isolation. +7. `tests/admin/` — channel URL roundtrip, owner persistence across reboot. +8. `tests/provisioning/` — region + modem + slot bake, admin key presence, `UNSET` region blocks TX, userPrefs survive factory reset. + +Invocation patterns: + +```bash +./mcp-server/run-tests.sh # full suite (auto-bake-if-needed) +./mcp-server/run-tests.sh --force-bake # reflash before testing +./mcp-server/run-tests.sh --assume-baked # skip bake (caller vouches for device state) +./mcp-server/run-tests.sh tests/mesh # one tier +./mcp-server/run-tests.sh tests/mesh/test_direct_with_ack.py # one file +./mcp-server/run-tests.sh -k telemetry # name filter +``` + +**No hardware detected?** The wrapper auto-narrows to `tests/unit/` only and prints `detected hub : (none)` in the pre-flight header. Agents interpreting the output should call this out explicitly — a 52-test green run without hardware is qualitatively different from a 12-unit-test green run. + +**Artifacts every run produces:** + +- `mcp-server/tests/report.html` — self-contained pytest-html. Each test gets a `Meshtastic debug` section with the tail of firmware log + device state dump. **Open this first** on failures; it's the canonical evidence source. +- `mcp-server/tests/junit.xml` — CI-parseable. +- `mcp-server/tests/reportlog.jsonl` — pytest-reportlog stream (`$report_type` keyed JSONL). Consumed by the live TUI. +- `mcp-server/tests/fwlog.jsonl` — firmware log mirror from the `meshtastic.log.line` pubsub topic. Populated by the `_firmware_log_stream` autouse session fixture. + +### Live TUI (`meshtastic-mcp-test-tui`) + +A Textual-based live view that wraps `run-tests.sh`. Tails reportlog for per-test state, streams firmware logs, polls device state at startup + post-run (gated out of the active run because `hub_devices` holds exclusive port locks). Key bindings: + +| Key | Action | +| --- | ------------------------------------------------------------------------------------------------------------ | +| `r` | re-run focused test (leaf → that node id; internal node → directory or `-k`) | +| `f` | filter tree by substring | +| `d` | failure detail modal (pulls `longrepr` + captured stdout from the reportlog) | +| `g` | export reproducer bundle (tar.gz with README, test_report.json, time-filtered fwlog, devices.json, env.json) | +| `l` | toggle firmware log pane | +| `x` | tool coverage modal | +| `c` | cross-run history sparkline | +| `q` | quit (SIGINT → SIGTERM → SIGKILL escalation, 5-s windows each) | + +Launch: + +```bash +cd mcp-server +.venv/bin/meshtastic-mcp-test-tui # full suite +.venv/bin/meshtastic-mcp-test-tui tests/mesh # args pass through to pytest +``` + +The plain CLI stays primary; the TUI is for operators who want a live dashboard. Both consume the same `run-tests.sh`. + +### Slash commands (Claude Code + Copilot) + +Three AI-assisted workflows wrap the test harness. Claude Code operators get `/test`, `/diagnose`, `/repro`; Copilot operators get `/mcp-test`, `/mcp-diagnose`, `/mcp-repro`. Bodies: + +- `.claude/commands/{test,diagnose,repro}.md` +- `.github/prompts/mcp-{test,diagnose,repro}.prompt.md` + +`.claude/commands/README.md` is the index. + +House rules for agents running these prompts: + +- **Interpret failures, don't just echo them.** Pull firmware log tails from `report.html` and classify each failure as transient / environmental / regression. Use the exact format in `.claude/commands/test.md`. +- **No destructive writes without operator approval.** Any skill that could reflash, factory-reset, or reboot a device must describe the action and stop. The operator authorizes. +- **Sequential MCP calls per port.** See above. +- **"Unknown" is a valid classification.** If evidence doesn't support a root cause, say so and list what would disambiguate. Do not invent. + +### Key fixtures (test authors + agents debugging) + +`mcp-server/tests/conftest.py` provides: + +- **`_session_userprefs`** (autouse session) — snapshots `userPrefs.jsonc` at session start, merges the session test profile via `userprefs.merge_active(test_profile)`, restores at teardown. Four layers of safety: pytest teardown + `atexit` + sidecar file (`userPrefs.jsonc.mcp-session-bak`) + startup self-heal in `run-tests.sh`. **Do not edit `userPrefs.jsonc` from inside a test.** +- **`_firmware_log_stream`** (autouse session) — subscribes to `meshtastic.log.line` pubsub on every connected `SerialInterface` and mirrors lines to `tests/fwlog.jsonl`. Drives the TUI firmware-log pane. +- **`_debug_log_buffer`** (autouse per-test) — captures last 200 firmware log lines + device state for attachment to the pytest-html `Meshtastic debug` section on failure. +- **`hub_devices`** (session) — `dict[role, SerialInterface]` with session-long exclusive port locks. Reason the TUI's device poller is gated to startup + post-run only. +- **`baked_mesh`** — parametrized mesh-pair fixture; depends on `test_00_bake`. `pytest_generate_tests` in `conftest.py` auto-generates `[nrf52->esp32s3]` and `[esp32s3->nrf52]` variants. +- **`test_profile`** — session-scoped dict: region, primary channel, admin key, PSK seed. Derived from `MESHTASTIC_MCP_SEED` (defaults to `mcp--`). + +### Firmware integration points tied to the test harness + +Two firmware changes exist specifically so the test harness works reliably. **Keep these in mind when touching related code.** + +- **`src/mesh/StreamAPI.cpp` + `StreamAPI.h`** — `emitLogRecord` uses a dedicated `fromRadioScratchLog` + `txBufLog` pair and a `concurrency::Lock streamLock`. Before this fix, `debug_log_api_enabled=true` would tear `FromRadio` protobufs on the serial transport because `emitTxBuffer` and `emitLogRecord` shared a single scratch buffer. The conftest enables the log stream session-wide; without this fix the device would corrupt its own FromRadio replies mid-session. +- **`src/mesh/PhoneAPI.cpp`** — `ToRadio` `Heartbeat(nonce=1)` triggers `nodeInfoModule->sendOurNodeInfo(NODENUM_BROADCAST, true, 0, true)` for serial clients, mirroring the pre-existing behavior for TCP/UDP clients in `PacketAPI.cpp`. The mesh tests rely on this to force a NodeInfo broadcast right after connect so the peer discovers them before the test's first assertion. + +If you're modifying `StreamAPI`, `PhoneAPI`, `NodeInfoModule`, or `userPrefs` flow, run `./mcp-server/run-tests.sh` at minimum before asking for review. + +### Recovery playbooks + +| Symptom | First check | Fix | +| ---------------------------------------------------------- | ------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `userPrefs.jsonc` dirty after test run | `git status --porcelain userPrefs.jsonc` | If non-empty, re-run `./mcp-server/run-tests.sh` once — the pre-flight self-heal restores from sidecar. If still dirty, `git checkout userPrefs.jsonc`. | +| Port busy / wedged CP2102 on macOS | `lsof /dev/cu.usbserial-0001` | Kill the holder. USB replug if the kernel still reports busy. Often a stale `pio device monitor` or zombie `meshtastic_mcp` process. | +| nRF52 appears unresponsive | `list_devices` shows VID `0x239A` but `device_info` times out | `touch_1200bps(port=...)` drops it into the DFU bootloader → `pio_flash` re-installs. | +| Multiple MCP server processes | `ps aux \| grep meshtastic_mcp` shows >1 | Kill all but the one your MCP host spawned. Zombies hold ports and break tests. | +| Mesh formation fails, one side sees peer but other doesn't | `/diagnose` (or `list_nodes` on both sides) | Asymmetric NodeInfo. `test_direct_with_ack` has a heal path; `/repro` it a few times. If persistent, both devices' clocks may be out of sync with their NodeInfo cooldown. | +| "role not present on hub" in skip reasons | `list_devices` | Expected if a device is unplugged. Reconnect before re-running the tier. | +| Tests fail only on first attempt then pass on rerun | — | State leak from a prior session. Run with `--force-bake` to reset to a known state. | + +### Never do these without asking + +- `factory_reset` — wipes node identity; regenerates PKI keypair. Mesh peers will reject old DMs until re-exchange. Legitimate only when the operator explicitly wants it. +- `erase_and_flash` — full chip erase; destroys all on-device state. +- `esptool_erase_flash` / `esptool_raw` write/erase — bypasses pio's safety chain. +- `set_config` on `lora.region` — changes regulatory domain; requires physical-location context the operator has and the agent doesn't. +- `reboot` / `shutdown` mid-test — breaks fixture invariants. +- `push -f`, `rebase -i`, `reset --hard`, or any history-rewriting git operation. +- Clicking computer-use tools on web links in Mail/Messages/PDFs — open URLs via the claude-in-chrome MCP so the extension's link-safety checks apply. + ## Resources - [Documentation](https://meshtastic.org/docs/) diff --git a/.github/prompts/mcp-diagnose.prompt.md b/.github/prompts/mcp-diagnose.prompt.md new file mode 100644 index 00000000000..c86826030d9 --- /dev/null +++ b/.github/prompts/mcp-diagnose.prompt.md @@ -0,0 +1,57 @@ +--- +mode: agent +description: Device health report via the meshtastic MCP tools (Copilot equivalent of the Claude Code /diagnose slash command) +--- + +# `/mcp-diagnose` — device health report + +Equivalent of `.claude/commands/diagnose.md`. Use when the operator asks to "check the devices", "what's the mesh looking like", "is nrf52 alive", etc. + +This prompt assumes the meshtastic MCP server is registered with your VS Code Copilot agent. If it isn't, fall back to running `./mcp-server/run-tests.sh tests/unit` plus a short `device_info` script via the terminal. + +## What to do + +1. **Enumerate hardware** via the `list_devices` MCP tool (with `include_unknown=True`). For each entry where `likely_meshtastic=True`, capture `port`, `vid`, `pid`, `description`. + +2. **Apply the operator's filter** (if any): + - No filter → every likely-meshtastic device. + - `nrf52` → `vid == 0x239a` + - `esp32s3` → `vid == 0x303a` or `vid == 0x10c4` + - A `/dev/cu.*` path → only that port. + - Anything else → substring match on port. + +3. **For each selected device, in sequence (don't parallelize — SerialInterface holds an exclusive port lock):** + - `device_info(port=

)` → `my_node_num`, `long_name`, `short_name`, `firmware_version`, `hw_model`, `region`, `num_nodes`, `primary_channel` + - `list_nodes(port=

)` → peer count, which peers have `publicKey`, SNR/RSSI distribution + - `get_config(section="lora", port=

)` → region, preset, channel_num, tx_power, hop_limit + - If anything looks off (can't connect, `num_nodes` wrong, missing `firmware_version`), open a short firmware-log window: `serial_open(port=

, env=)`, wait 3 seconds, `serial_read(session_id, max_lines=100)`, `serial_close(session_id)`. Infer env from VID (0x239a → `rak4631`, 0x303a/0x10c4 → `heltec-v3`) unless an `MESHTASTIC_MCP_ENV_` env var overrides it. + +4. **Render per-device report** as a compact block: + + ```text + [nrf52 @ /dev/cu.usbmodem1101] fw=2.7.23.bce2825, hw=RAK4631 + owner : Meshtastic 40eb / 40eb + region/band : US, channel 88, LONG_FAST + tx_power : 30 dBm, hop_limit=3 + peers : 1 (esp32s3 0x433c2428, pubkey ✓, SNR 6.0 / RSSI -24 dBm) + primary ch : McpTest + firmware : no panics in last 3s + ``` + + Flag abnormalities inline with `⚠︎ ` — missing pubkey on a known peer, region UNSET, mismatched channel name, etc. + +5. **Cross-device correlation** (when >1 device selected): + - Do both see each other in `nodesByNum`? + - Do `region`, `channel_num`, `modem_preset` match across devices? + - Do the primary channel names match? (Different name → different PSK → no decode.) + +6. **Suggest next steps only for recognizable failure modes**, never speculatively: + - Stale PKI one-way → "`/mcp-test tests/mesh/test_direct_with_ack.py` — the test's retry+nodeinfo-ping heals this." + - Region mismatch → "re-bake one side via `./mcp-server/run-tests.sh --force-bake`." + - Device unreachable → refer operator to the touch_1200bps + CP2102-wedged-driver notes in `run-tests.sh`. + +## Hard constraints + +- **Read-only.** No `set_config`, no `reboot`, no `factory_reset`, no `flash`. If the operator wants mutation, they'll escalate explicitly. +- **Open/query/close per device.** Never hold multiple SerialInterfaces to the same port. The port lock is exclusive. +- **Don't infer env beyond the VID map** — if the operator has an unusual board, ask them which env to use rather than guessing. diff --git a/.github/prompts/mcp-repro.prompt.md b/.github/prompts/mcp-repro.prompt.md new file mode 100644 index 00000000000..be2963c3318 --- /dev/null +++ b/.github/prompts/mcp-repro.prompt.md @@ -0,0 +1,67 @@ +--- +mode: agent +description: Re-run a specific test N times to triage flakes; diff firmware logs between passes and failures (Copilot equivalent of the Claude Code /repro slash command) +--- + +# `/mcp-repro` — flakiness triage for one test + +Equivalent of `.claude/commands/repro.md`. Use when the operator says "that one test is flaky — dig in", "repro the direct_with_ack failure", "why does X sometimes fail?". + +## What to do + +1. **Parse the operator's input** into two pieces: + - **Test identifier** — either a pytest node id (has `::` or starts with `tests/`) or a `-k`-style filter (plain substring like `direct_with_ack`). + - **Count** — integer, default `5`, cap at `20`. If the operator asks for 50, negotiate down and explain (airtime + USB wear). + +2. **Sanity-check the hub** via the `list_devices` MCP tool. If the test name references `nrf52` or `esp32s3` and the matching VID isn't present, stop and report — re-running won't help. + +3. **Loop** N times. Each iteration: + + ```bash + ./mcp-server/run-tests.sh --tb=short -p no:cacheprovider + ``` + + `-p no:cacheprovider` keeps pytest from caching anything between iterations. Capture: exit code, duration, and (on failure) the `Meshtastic debug` firmware-log section from `mcp-server/tests/report.html`. + +4. **Tally** results as you go: + + ```text + attempt 1: PASS (42s) + attempt 2: FAIL (128s) ← fw log captured + attempt 3: PASS (39s) + attempt 4: FAIL (121s) + attempt 5: PASS (41s) + -------------------------------------------------- + pass rate: 3/5 (60%) | mean duration: 74s + ``` + +5. **On mixed outcomes, diff the firmware logs** between one representative pass and one representative fail. Focus on: + - Error-level lines present only in failures (`PKI_UNKNOWN_PUBKEY`, `Alloc an err=`, `Skip send`, `No suitable channel`, `NAK`) + - Timing around the assertion point (broadcast sent? ACK received? retry fired?) + - Device-state fields that changed between attempts + + Surface the top 3 differences as a compact "passes when / fails when" table with uptime timestamps. Don't dump full logs. + +6. **Classify** the flake into one of: + - **LoRa airtime collision** — pass rate improves with fewer concurrent transmitters. Suggest a `time.sleep` gap or retry bump in the test body. + - **PKI key staleness** — first attempt fails, subsequent ones pass; existing retry-loop pattern in `test_direct_with_ack.py` is the fix. + - **NodeInfo cooldown** — `Skip send NodeInfo since we sent it <600s ago` in fail-only logs; needs a `broadcast_nodeinfo_ping()` warmup. + - **Hardware-specific** — one direction consistently fails, firmware versions differ, CP2102 driver wedged, etc. + - **Unknown** — say so. Don't invent a root cause. + +7. **Report back** with: + - Pass rate + mean duration. + - Classification + the specific log evidence for it. + - A concrete next step (tighter assertion, more retries, open `/mcp-diagnose`, file a bug, nothing). + +## Examples + +- `tests/mesh/test_direct_with_ack.py::test_direct_with_ack_roundtrip[esp32s3->nrf52] 10` — 10 runs of that parametrized case. +- `broadcast_delivers` — no `::`, no `tests/`; treat as `-k broadcast_delivers`; runs every match 5 times. +- `tests/telemetry/test_device_telemetry_broadcast.py 3` — shorter count for a slow test. + +## Notes + +- If the FIRST attempt fails and the rest pass, that's a state-leak signature — suggest starting from `--force-bake` or a clean device state rather than chasing the first-failure firmware logs. +- If ALL N fail, this isn't a flake — it's a regression. Say so, stop iterating, escalate to `/mcp-test` for full-suite context. +- Don't rebuild firmware during triage. Flakes that only reproduce under different firmware belong in a separate session with a plan. diff --git a/.github/prompts/mcp-test.prompt.md b/.github/prompts/mcp-test.prompt.md new file mode 100644 index 00000000000..092ad3d856c --- /dev/null +++ b/.github/prompts/mcp-test.prompt.md @@ -0,0 +1,51 @@ +--- +mode: agent +description: Run the mcp-server test suite and interpret results (Copilot equivalent of the Claude Code /test slash command) +--- + +# `/mcp-test` — mcp-server test runner with interpretation + +Equivalent of the Claude Code `/test` slash command in `.claude/commands/test.md`. Use this when the operator asks you to "run the tests", "check the mcp test suite", "run the mesh tests", etc. + +## What to do + +1. **Invoke the wrapper** from the firmware repo root: + + ```bash + ./mcp-server/run-tests.sh [pytest-args] + ``` + + If the operator specified a subset (e.g. "just the mesh tests"), pass it through as `tests/mesh` or a pytest `-k filter`. If they said nothing, use the wrapper's defaults (full suite with pytest-html report). + + The wrapper auto-detects connected Meshtastic devices, maps each to its PlatformIO env, exports the required env vars, and invokes pytest. Zero pre-flight config needed from the operator. + +2. **Read the pre-flight header** (first few lines of wrapper output). The `detected hub :` line lists role → port → env mappings. If it reads `(none)`, the wrapper narrowed to `tests/unit` only — call that out explicitly so the operator knows hardware tiers were skipped. + +3. **On pass**: one-line summary like `N passed, M skipped in `. Don't enumerate test names. DO mention any non-placeholder SKIPs (things like "role not present on hub") because they indicate missing hardware or setup issues. + +4. **On failure**: open `mcp-server/tests/report.html` (pytest-html output, self-contained) and extract the `Meshtastic debug` section for each failed test. That section includes a firmware log stream (last 200 lines) and device state dump. For each failure, summarise: + - test name + - one-line assertion message + - the specific firmware log lines that explain why (look for `PKI_UNKNOWN_PUBKEY`, `Skip send NodeInfo`, `Error=`, `Guru Meditation`, `assertion failed`, `No suitable channel`) + +5. **Classify each failure** as one of: + - **Transient flake** — LoRa collision, first-attempt NAK with self-heal pattern, timing-sensitive assertion. Suggest `/mcp-repro ` to confirm. + - **Environmental** — device unreachable, port busy, CP2102 driver wedged on macOS. Suggest specific recovery (USB replug, `touch_1200bps`, `git status userPrefs.jsonc`). + - **Regression** — same assertion fails repeatedly on re-runs, firmware log shows novel errors. Identify the firmware module likely responsible. + +6. **Do NOT run destructive recovery automatically**. If a failure looks like it needs a reflash, factory*reset, or replug — \_describe the steps* and let the operator decide. Never burn airtime or flash cycles without approval. + +## Arguments convention + +Operators generally invoke this prompt either with no arguments (full suite) or with a specific subset. Examples: + +- `tests/mesh` — one tier +- `tests/mesh/test_direct_with_ack.py::test_direct_with_ack_roundtrip` — one test +- `--force-bake` — reflash devices first +- `-k telemetry` — name-filter + +## Side-effects to confirm in your summary + +- `userPrefs.jsonc` should be clean after a successful run. The session fixture in `mcp-server/tests/conftest.py` (`_session_userprefs`) snapshots and restores. Check `git status --porcelain userPrefs.jsonc` and report if it's non-empty. +- `mcp-server/tests/report.html` and `junit.xml` regenerate on every run. +- The wrapper prints a warning if a `.mcp-session-bak` sidecar was left over from a crashed prior session and auto-restores from it — mention that if it happened. diff --git a/.gitignore b/.gitignore index 43cee78db73..f1eb9d852d7 100644 --- a/.gitignore +++ b/.gitignore @@ -54,3 +54,5 @@ CMakeLists.txt # PYTHONPATH used by the Nix shell .python3 +.claude/scheduled_tasks.lock +userPrefs.jsonc.mcp-session-bak diff --git a/.mcp.json b/.mcp.json new file mode 100644 index 00000000000..c5cf2e55e5a --- /dev/null +++ b/.mcp.json @@ -0,0 +1,11 @@ +{ + "mcpServers": { + "meshtastic": { + "command": "./mcp-server/.venv/bin/python", + "args": ["-m", "meshtastic_mcp"], + "env": { + "MESHTASTIC_FIRMWARE_ROOT": "." + } + } + } +} diff --git a/.trunk/configs/.bandit b/.trunk/configs/.bandit index d286ded8974..c70e7743b67 100644 --- a/.trunk/configs/.bandit +++ b/.trunk/configs/.bandit @@ -1,2 +1,28 @@ [bandit] -skips = B101 \ No newline at end of file +# Rule IDs: https://bandit.readthedocs.io/en/latest/plugins/index.html +# +# B101 assert_used +# pytest assertions + internal invariants; required for pytest. +# B110 try_except_pass +# best-effort cleanup paths (atexit handlers, pubsub unsubscribe, +# session-end file close, socket shutdown). Logging inside the +# except block would be worse than the silent pass — teardown is +# already at end-of-session and the surrounding caller has context. +# B112 try_except_continue +# defensive loops over flaky sources (pubsub handlers, device +# re-enumeration polls). One failed iteration shouldn't abort the loop. +# B404 import_subprocess +# mcp-server wraps PlatformIO, esptool, nrfutil, picotool, and the +# pytest test-runner — subprocess is a load-bearing import here, not +# a smell. The "consider possible security implications" advisory is +# redundant given the file-level review already applied. +# B603 subprocess_without_shell_equals_true +# all subprocess calls use a static argv list; `shell=False` is the +# default and we never string-interpolate user input into the command. +# B606 start_process_with_no_shell +# same invariant as B603 — running a binary via argv list (not +# `shell=True`) is the safe pattern bandit is asking for. +# +# Higher-severity checks (B102 exec_used, B301 pickle, B307 eval, +# B602 shell=True, etc.) remain enabled. +skips = B101,B110,B112,B404,B603,B606 \ No newline at end of file diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 00000000000..cd043c08787 --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,113 @@ +# Agent instructions + +This repository is the [Meshtastic](https://meshtastic.org) firmware — a C++17 embedded codebase targeting ESP32 / nRF52 / RP2040 / STM32WL / Linux-Portduino LoRa mesh radios — plus a Python MCP server in `mcp-server/` that AI agents use to flash, configure, and test connected devices. + +## Primary instruction file + +**Read `.github/copilot-instructions.md` first.** That file is the canonical agent-facing document for this repo. It covers project layout, coding conventions (naming, module framework, Observer pattern, thread safety), the build system, CI/CD, the native C++ test suite, and — most importantly for automation work — the **MCP Server & Hardware Test Harness** section. Read it top-to-bottom before starting any non-trivial change. + +This file (`AGENTS.md`) is a short pointer + quick reference for agents that don't read `.github/copilot-instructions.md` by default. + +## Quick command reference + +| Action | Command | +| -------------------------------- | ----------------------------------------------------------------------------------- | +| Build a firmware variant | `pio run -e ` (e.g. `pio run -e rak4631`, `pio run -e heltec-v3`) | +| Clean + rebuild | `pio run -e -t clean && pio run -e ` | +| Flash a device | `pio run -e -t upload --upload-port ` (or use the `pio_flash` MCP tool) | +| Run firmware unit tests (native) | `pio test -e native` | +| Run MCP hardware tests | `./mcp-server/run-tests.sh` | +| Live TUI test runner | `mcp-server/.venv/bin/meshtastic-mcp-test-tui` | +| Format before commit | `trunk fmt` | +| Regenerate protobuf bindings | `bin/regen-protos.sh` | +| Generate CI matrix | `./bin/generate_ci_matrix.py all [--level pr]` | + +## MCP server (device + test automation) + +The `mcp-server/` package exposes ~32 MCP tools for device discovery, building, flashing, serial monitoring, and live-node administration. Tools are grouped as: + +- **Discovery**: `list_devices`, `list_boards`, `get_board` +- **Build & flash**: `build`, `clean`, `pio_flash`, `erase_and_flash` (ESP32 factory), `update_flash` (ESP32 OTA), `touch_1200bps` +- **Serial sessions**: `serial_open`, `serial_read`, `serial_list`, `serial_close` +- **Device reads**: `device_info`, `list_nodes` +- **Device writes** (require `confirm=True`): `set_owner`, `get_config`, `set_config`, `get_channel_url`, `set_channel_url`, `send_text`, `reboot`, `shutdown`, `factory_reset`, `set_debug_log_api` +- **userPrefs admin**: `userprefs_get`, `userprefs_set`, `userprefs_reset`, `userprefs_manifest`, `userprefs_testing_profile` +- **Vendor escape hatches**: `esptool_*`, `nrfutil_*`, `picotool_*` + +Setup: `cd mcp-server && python3 -m venv .venv && .venv/bin/pip install -e '.[test]'`. The repo registers the server via `.mcp.json` — Claude Code picks it up automatically. + +See `mcp-server/README.md` for argument shapes and the **MCP Server & Hardware Test Harness** section of `.github/copilot-instructions.md` for agent usage rules (tool surface, fixture contract, firmware integration points, recovery playbooks). + +## Slash commands (AI-assisted workflows) + +Three test-and-diagnose workflows exist as slash commands: + +- **`/test` (Claude Code) / `/mcp-test` (Copilot)** — run the hardware test suite and interpret failures +- **`/diagnose` / `/mcp-diagnose`** — read-only device health report +- **`/repro` / `/mcp-repro`** — flakiness triage: re-run one test N times, diff firmware logs between passes and failures + +Bodies live in `.claude/commands/` and `.github/prompts/` respectively. `.claude/commands/README.md` is the index. + +## House rules + +- **No destructive device operations without operator approval.** `factory_reset`, `erase_and_flash`, `reboot`, `shutdown`, history-rewriting git ops — describe the action and stop. Operator authorizes. +- **One MCP call per serial port at a time.** The port lock is exclusive; concurrent calls deadlock. Sequence: open → read/mutate → close, then next device. +- **`userPrefs.jsonc` is session state during tests.** The `_session_userprefs` fixture snapshots + restores it; never edit it from inside a test. +- **Don't speculate about firmware root causes.** When evidence doesn't support a classification, say "unknown" and list what would disambiguate. +- **Run `trunk fmt` before proposing a commit.** The `trunk_check` CI gate will reject unformatted code. +- **`confirm=True` on destructive MCP tools is a real gate, not a formality.** Don't bypass it via auto-approve settings. + +## Typical agent workflows + +### Flashing a device + +1. `list_devices` → find the port + likely VID +2. `list_boards` → confirm the env, or use the known default for the hardware +3. `pio_flash(env=..., port=..., confirm=True)` for any arch, or `erase_and_flash(env=..., port=..., confirm=True)` for an ESP32 factory install + +### Inspecting live node state + +1. `device_info(port=...)` — short summary (node num, firmware version, region, peer count) +2. `list_nodes(port=...)` — full peer table (SNR, RSSI, pubkey presence, last_heard) +3. `get_config(section="lora", port=...)` — LoRa settings for cross-device comparison + +Sequence these; don't parallelize on the same port. + +### Testing a firmware change + +1. Build locally: `pio run -e ` +2. Flash the test device: `pio_flash(env=..., port=..., confirm=True)` +3. Run the suite: `./mcp-server/run-tests.sh tests/` or `/test tests/` +4. On failure, open `mcp-server/tests/report.html` → `Meshtastic debug` section for the firmware log tail + device state dump +5. Iterate + +### Debugging a flaky test + +1. `/repro [count]` — re-runs the test N times, diffs firmware logs between passes and failures +2. If the first attempt always fails and the rest pass, that's a state-leak pattern → suggest `--force-bake` or a clean device state, don't chase the first failure +3. If all N fail, this isn't a flake — it's a regression. Stop iterating and escalate to `/test` for full-suite context. + +## Where to look + +| Path | What's there | +| --------------------------------- | ---------------------------------------------------------------------------------------------------- | +| `src/` | Firmware C++ source (`mesh/`, `modules/`, `platform/`, `graphics/`, `gps/`, `motion/`, `mqtt/`, …) | +| `src/mesh/` | Core: NodeDB, Router, Channels, CryptoEngine, radio interfaces, StreamAPI, PhoneAPI | +| `src/modules/` | Feature modules; `Telemetry/Sensor/` has 50+ I2C sensor drivers | +| `variants/` | 200+ hardware variant definitions (`variant.h` + `platformio.ini` per board) | +| `protobufs/` | `.proto` definitions; regenerate with `bin/regen-protos.sh` | +| `test/` | Firmware unit tests (12 suites; `pio test -e native`) | +| `mcp-server/` | Python MCP server + pytest hardware integration tests | +| `mcp-server/tests/` | Tiered pytest suite: `unit/`, `mesh/`, `telemetry/`, `monitor/`, `fleet/`, `admin/`, `provisioning/` | +| `.claude/commands/` | Claude Code slash command bodies | +| `.github/prompts/` | Copilot prompt bodies (mirrors of the Claude Code ones) | +| `.github/copilot-instructions.md` | **Primary agent instructions — read this** | +| `.github/workflows/` | CI pipelines | +| `.mcp.json` | MCP server registration for Claude Code | + +## Recovery one-liners + +- **`userPrefs.jsonc` dirty after a test run?** Re-run `./mcp-server/run-tests.sh` once (pre-flight self-heals from the sidecar). If still dirty: `git checkout userPrefs.jsonc`. +- **nRF52 not responding?** `mcp__meshtastic__touch_1200bps(port=...)` drops it into the DFU bootloader, then `pio_flash` re-installs. +- **Port busy?** `lsof ` to find the holder. Usually a stale `pio device monitor` or zombie `meshtastic_mcp` process. Kill it. +- **Multiple MCP servers running?** `ps aux | grep meshtastic_mcp` — zombies hold ports. Kill all but the one your host spawned. diff --git a/mcp-server/.gitignore b/mcp-server/.gitignore new file mode 100644 index 00000000000..f5180bc71a1 --- /dev/null +++ b/mcp-server/.gitignore @@ -0,0 +1,26 @@ +.venv/ +__pycache__/ +*.py[cod] +*.egg-info/ +.pytest_cache/ +.mypy_cache/ +dist/ +build/ + +# Test harness artifacts +tests/report.html +tests/junit.xml +tests/reportlog.jsonl +tests/fwlog.jsonl +# Subprocess-output tee from pio/esptool/nrfutil/picotool (live flash +# progress for the TUI; also a post-run diagnostic for plain CLI runs). +tests/flash.log +tests/tool_coverage.json +tests/.coverage +htmlcov/ +# Persistent run counter for meshtastic-mcp-test-tui header. +tests/.tui-runs +# Cross-run history (TUI duration sparkline). +tests/.history/ +# Reproducer bundles (TUI `x` export on failed tests). +tests/reproducers/ diff --git a/mcp-server/README.md b/mcp-server/README.md new file mode 100644 index 00000000000..7d5fc551a7b --- /dev/null +++ b/mcp-server/README.md @@ -0,0 +1,270 @@ +# Meshtastic MCP Server + +An [MCP](https://modelcontextprotocol.io) server for working with the Meshtastic firmware repo and connected devices. Lets Claude Code / Claude Desktop: + +- Discover USB-connected Meshtastic devices +- Enumerate PlatformIO board variants (166+) with Meshtastic metadata +- Build, clean, flash, erase-and-flash (factory), and OTA-update firmware +- Read serial logs via `pio device monitor` (with board-specific exception decoders) +- Trigger 1200bps touch-reset for bootloader entry (nRF52, ESP32-S3, RP2040) +- Query and administer a running node via the [`meshtastic` Python API](https://github.com/meshtastic/python): owner name, config (LocalConfig + ModuleConfig), channels, messaging, reboot/shutdown/factory-reset +- Call `esptool`, `nrfutil`, `picotool` directly when PlatformIO doesn't cover the operation + +## Design principle + +**PlatformIO first.** Its `pio run -t upload` knows the correct protocol, offsets, and post-build chain for every variant in `variants/`. Direct vendor-tool wrappers (`esptool_*`, `nrfutil_*`, `picotool_*`) exist as escape hatches for operations pio doesn't cover (blank-chip erase, DFU `.zip` packages, BOOTSEL-mode inspection). + +## Prerequisites + +- Python ≥ 3.11 +- [PlatformIO Core](https://platformio.org/install/cli) — `pio` on `$PATH` or at `~/.platformio/penv/bin/pio` +- The Meshtastic firmware repo checked out somewhere (set via `MESHTASTIC_FIRMWARE_ROOT`) +- Optional: `esptool`, `nrfutil`, `picotool` on `$PATH` (or under the firmware venv at `.venv/bin/`) if you want to use the direct-tool wrappers + +## Install + +```bash +cd /mcp-server +python3 -m venv .venv +.venv/bin/pip install -e . +``` + +Verify: + +```bash +MESHTASTIC_FIRMWARE_ROOT= .venv/bin/python -m meshtastic_mcp +``` + +The server blocks on stdin (that's correct — it speaks MCP over stdio). Ctrl-C to exit. + +## Register with Claude Code + +Edit `~/.claude/settings.json` (global) or `/.claude/settings.local.json` (project-only): + +```json +{ + "mcpServers": { + "meshtastic": { + "command": "/mcp-server/.venv/bin/python", + "args": ["-m", "meshtastic_mcp"], + "env": { + "MESHTASTIC_FIRMWARE_ROOT": "" + } + } + } +} +``` + +Replace `` with the absolute path, e.g. `/Users/you/GitHub/firmware`. Restart Claude Code after editing. + +## Register with Claude Desktop + +Same `mcpServers` block, but in `~/Library/Application Support/Claude/claude_desktop_config.json` (macOS) or `%APPDATA%\Claude\claude_desktop_config.json` (Windows). + +## Tools (38) + +### Discovery & metadata + +| Tool | What it does | +| -------------- | ------------------------------------------------------------------------------------------ | +| `list_devices` | USB/serial port listing, flags likely-Meshtastic candidates | +| `list_boards` | PlatformIO envs with `custom_meshtastic_*` metadata; filters by arch/supported/query/level | +| `get_board` | Full env dict incl. raw pio config | + +### Build & flash + +| Tool | What it does | +| ----------------- | -------------------------------------------------------------------- | +| `build` | `pio run -e ` (+ mtjson target) | +| `clean` | `pio run -e -t clean` | +| `pio_flash` | `pio run -e -t upload --upload-port ` — any architecture | +| `erase_and_flash` | ESP32 full factory flash via `bin/device-install.sh` | +| `update_flash` | ESP32 OTA app-partition update via `bin/device-update.sh` | +| `touch_1200bps` | 1200-baud open/close to trigger USB CDC bootloader entry | + +### Serial log sessions + +Backed by long-running `pio device monitor` subprocesses with a 10k-line ring buffer per session and board-specific filters (`esp32_exception_decoder` auto-selected when you pass `env=`). + +| Tool | What it does | +| -------------- | ------------------------------------------------------------------ | +| `serial_open` | Start a monitor session; returns `session_id` | +| `serial_read` | Cursor-based pull; reports `dropped` if lines aged out of the ring | +| `serial_list` | All active sessions | +| `serial_close` | Terminate a session | + +### Device reads + +| Tool | What it does | +| ------------- | --------------------------------------------------------------------------- | +| `device_info` | my_node_num, long/short name, firmware version, region, channel, node count | +| `list_nodes` | Full node database with position, SNR, RSSI, last_heard, battery | + +_The tool tables below document 38 currently registered MCP server tools._ + +### Device writes + +| Tool | What it does | +| ------------------- | -------------------------------------------------------------------------- | +| `set_owner` | Long name + optional short name (≤4 chars) | +| `get_config` | One section or all (LocalConfig + ModuleConfig) | +| `set_config` | Dot-path field write: `lora.region`=`"US"`, `device.role`=`"ROUTER"`, etc. | +| `get_channel_url` | Primary-only or include_all=admin URL | +| `set_channel_url` | Import channels from a Meshtastic URL | +| `set_debug_log_api` | Enable or disable debug logging for the Meshtastic Python API client | +| `send_text` | Broadcast or direct text message | +| `reboot` | `localNode.reboot(secs)` — requires `confirm=True` | +| `shutdown` | `localNode.shutdown(secs)` — requires `confirm=True` | +| `factory_reset` | `localNode.factoryReset(full?)` — requires `confirm=True` | + +### Direct hardware tools (escape hatches) + +| Tool | What it does | +| --------------------- | --------------------------------------------------------- | +| `esptool_chip_info` | Read chip, MAC, crystal, flash size | +| `esptool_erase_flash` | Full-chip erase (destructive) | +| `esptool_raw` | Pass-through; confirm=True required for write/erase/merge | +| `nrfutil_dfu` | DFU-flash a `.zip` package | +| `nrfutil_raw` | Pass-through | +| `picotool_info` | Read Pico BOOTSEL-mode info | +| `picotool_load` | Load a UF2 | +| `picotool_raw` | Pass-through | + +## Safety + +- **All destructive flash/admin tools require `confirm=True`** as a tool-level gate, on top of any permission prompt from Claude. +- **Serial port is exclusive.** If a `serial_*` session is active on a port, `device_info`/admin tools on the same port will fail fast with a pointer at the active `session_id`. Close the session first. +- **Flash confirmation by architecture**: `erase_and_flash` / `update_flash` error if the env's architecture isn't ESP32 — use `pio_flash` for nRF52/RP2040/STM32. + +## Environment variables + +| Var | Default | Purpose | +| -------------------------- | ----------------------------------------------------------- | ------------------------------------------------------------------- | +| `MESHTASTIC_FIRMWARE_ROOT` | walks up from cwd for `platformio.ini` | Pin the firmware repo | +| `MESHTASTIC_PIO_BIN` | `~/.platformio/penv/bin/pio` → `$PATH` `pio` → `platformio` | Override `pio` location | +| `MESHTASTIC_ESPTOOL_BIN` | `/.venv/bin/esptool` → `$PATH` | Override esptool | +| `MESHTASTIC_NRFUTIL_BIN` | `$PATH` | Override nrfutil | +| `MESHTASTIC_PICOTOOL_BIN` | `$PATH` | Override picotool | +| `MESHTASTIC_MCP_SEED` | `mcp--` | PSK seed for test-harness session (CI override) | +| `MESHTASTIC_MCP_FLASH_LOG` | `/tests/flash.log` | Tee target for pio/esptool/nrfutil subprocess output (TUI tails it) | + +## Hardware Test Suite + +`mcp-server/tests/` holds a pytest-based integration suite that exercises +real USB-connected Meshtastic devices against the MCP server surface. Separate +from the native C++ unit tests in the firmware repo's top-level `test/` +directory — this one validates the device-facing behavior end-to-end. + +### Invocation + +```bash +./mcp-server/run-tests.sh # full suite (auto-detect + auto-bake-if-needed) +./mcp-server/run-tests.sh --force-bake # reflash devices before testing +./mcp-server/run-tests.sh --assume-baked # skip the bake step (caller vouches for state) +./mcp-server/run-tests.sh tests/mesh # one tier +./mcp-server/run-tests.sh tests/mesh/test_traceroute.py # one file +./mcp-server/run-tests.sh -k telemetry # pytest name filter +``` + +The wrapper auto-detects connected devices (VID `0x239A` → `nrf52` → env +`rak4631`; `0x303A` or `0x10C4` → `esp32s3` → env `heltec-v3`), exports +`MESHTASTIC_MCP_ENV_` env vars, and invokes pytest. Overrides via +per-role env vars: `MESHTASTIC_MCP_ENV_NRF52=heltec-mesh-node-t114 ./run-tests.sh`. + +No hardware connected? The wrapper narrows to `tests/unit/` only and says so +in the pre-flight header. + +### Tiers (run in this order) + +- **`bake`** (`tests/test_00_bake.py`) — flashes both hub roles with the + session's test profile. Has a skip-if-already-baked check (region + channel + match); `--force-bake` overrides. +- **`unit`** — pure Python, no hardware. boards / PIO wrapper / + userPrefs-parse / testing-profile fixtures. +- **`mesh`** — 2-device mesh: formation, broadcast delivery, direct+ACK, + traceroute, bidirectional. Parametrized over both directions. +- **`telemetry`** — periodic telemetry broadcast + on-demand request/reply + (`TELEMETRY_APP` with `wantResponse=True`). +- **`monitor`** — boot log has no panic markers within 60 s of reboot. +- **`fleet`** — PSK-seed isolation: two labs with different seeds never + overlap. +- **`admin`** — owner persistence across reboot, channel URL round-trip, + `lora.hop_limit` persistence. +- **`provisioning`** — region/channel baking, userPrefs survive + `factory_reset(full=False)`. + +### Artifacts (regenerated every run, under `tests/`) + +- `report.html` — self-contained pytest-html report. Each test gets a + **Meshtastic debug** section attached on failure with a 200-line firmware + log tail + device-state dump. Open this first on failures. +- `junit.xml` — CI-parseable. +- `reportlog.jsonl` — `pytest-reportlog` event stream; consumed by the TUI. +- `fwlog.jsonl` — firmware log mirror (`meshtastic.log.line` pubsub → JSONL). +- `flash.log` — tee of all pio / esptool / nrfutil / picotool subprocess + output during the run (driven by `MESHTASTIC_MCP_FLASH_LOG`). + +### Live TUI + +```bash +.venv/bin/meshtastic-mcp-test-tui +.venv/bin/meshtastic-mcp-test-tui tests/mesh # pytest args pass through +``` + +Textual-based wrapper over `run-tests.sh` with a live test tree, tier +counters, pytest output pane, firmware-log pane, and a device-status strip. +Key bindings: `r` re-run focused, `f` filter, `d` failure detail, `g` open +`report.html`, `x` export reproducer bundle, `l` cycle fw-log filter, `q` +quit (SIGINT → SIGTERM → SIGKILL escalation). + +### Slash commands + +Three AI-assisted workflows are wired up for Claude Code operators +(`.claude/commands/`) and Copilot operators (`.github/prompts/`): +`/test` (run + interpret), `/diagnose` (read-only health report), `/repro` +(flake triage, N-times re-run with log diff). + +### House rules (for human + agent contributors) + +- Session-scoped fixtures in `tests/conftest.py` snapshot + restore + `userPrefs.jsonc`; **never edit `userPrefs.jsonc` from inside a test**. + Use the `test_profile` / `no_region_profile` fixtures for ephemeral + overrides. +- `SerialInterface` holds an **exclusive port lock**; sequence calls + open → mutate → close, then next device. No parallel calls to the + same port. +- Directed PKI-encrypted sends need **bilateral NodeInfo warmup** — + both sides must hold the other's current pubkey. See + `tests/mesh/_receive.py::nudge_nodeinfo_port` and the three directed- + send tests (`test_direct_with_ack`, `test_traceroute`, + `test_telemetry_request_reply`) for the canonical pattern. + +## Layout + +```text +mcp-server/ +├── pyproject.toml +├── README.md +└── src/meshtastic_mcp/ + ├── __main__.py # entry: python -m meshtastic_mcp + ├── server.py # FastMCP app + @app.tool() registrations (thin) + ├── config.py # firmware_root, pio_bin, esptool_bin, etc. + ├── pio.py # subprocess wrapper (timeouts, JSON, tail_lines) + ├── devices.py # list_devices (findPorts + comports) + ├── boards.py # list_boards / get_board (pio project config parse + cache) + ├── flash.py # build, clean, flash, erase_and_flash, update_flash, touch_1200bps + ├── serial_session.py # SerialSession + reader thread + ring buffer + ├── registry.py # session registry + per-port locks + ├── connection.py # connect(port) ctx mgr — SerialInterface + port lock + ├── info.py # device_info, list_nodes + ├── admin.py # set_owner, get/set_config, channels, send_text, reboot/shutdown/factory_reset + └── hw_tools.py # esptool / nrfutil / picotool wrappers +``` + +## Troubleshooting + +- **"Could not locate Meshtastic firmware root"** — set `MESHTASTIC_FIRMWARE_ROOT`. +- **"Could not find `pio`"** — install PlatformIO or set `MESHTASTIC_PIO_BIN`. +- **"Port is held by serial session ..."** — call `serial_close(session_id)` or `serial_list` to find it. +- **`factory.bin` not found after build** — the env may not be ESP32; only ESP32 envs produce a `.factory.bin`. +- **`touch_1200bps` reported `new_port: null`** — the device may not have 1200bps-reset stdio, or the bootloader re-uses the same port name. Check `list_devices` manually. diff --git a/mcp-server/pyproject.toml b/mcp-server/pyproject.toml new file mode 100644 index 00000000000..d73bf795f5f --- /dev/null +++ b/mcp-server/pyproject.toml @@ -0,0 +1,39 @@ +[project] +name = "meshtastic-mcp" +version = "0.1.0" +description = "MCP server for Meshtastic firmware development: device discovery, PlatformIO tooling, flashing, serial monitoring, and device administration via the meshtastic Python API." +readme = "README.md" +requires-python = ">=3.11" +license = { text = "GPL-3.0-only" } +authors = [{ name = "thebentern" }] +dependencies = ["mcp>=1.2", "pyserial>=3.5", "meshtastic>=2.7.8"] + +[project.optional-dependencies] +dev = ["pytest>=7"] +test = [ + "pytest>=8", + "pytest-html>=4", + "pytest-reportlog>=0.4", + "pytest-timeout>=2.3", + "coverage[toml]>=7", + "pyyaml>=6", + # textual is required by the `meshtastic-mcp-test-tui` script (see + # `src/meshtastic_mcp/cli/test_tui.py`). Bundled into `test` rather than a + # separate `[tui]` extra because v1 expects test operators are the only + # consumers; revisit if install cost pushes back. + "textual>=0.50", +] + +[project.scripts] +meshtastic-mcp = "meshtastic_mcp.__main__:main" +# Live TUI wrapping run-tests.sh — shells out to the same script the plain +# CLI uses, tails pytest-reportlog for per-test state, and polls the device +# list at startup + post-run (port lock forces it to stay idle during the run). +meshtastic-mcp-test-tui = "meshtastic_mcp.cli.test_tui:main" + +[build-system] +requires = ["hatchling"] +build-backend = "hatchling.build" + +[tool.hatch.build.targets.wheel] +packages = ["src/meshtastic_mcp"] diff --git a/mcp-server/run-tests.sh b/mcp-server/run-tests.sh new file mode 100755 index 00000000000..292e6e3a2f7 --- /dev/null +++ b/mcp-server/run-tests.sh @@ -0,0 +1,236 @@ +#!/usr/bin/env bash +# mcp-server hardware test runner. +# +# Auto-detects connected Meshtastic devices, maps each to its PlatformIO env +# via the same role table the pytest fixtures use, exports the right +# MESHTASTIC_MCP_ENV_* env vars, and invokes pytest. +# +# Usage: +# ./run-tests.sh # full suite, default pytest args +# ./run-tests.sh tests/mesh # subset (any pytest args pass through) +# ./run-tests.sh --force-bake # override one default with another +# MESHTASTIC_MCP_ENV_NRF52=foo ./run-tests.sh # override env per role +# MESHTASTIC_MCP_SEED=ci-run-42 ./run-tests.sh # override PSK seed +# +# If zero supported devices are detected, only the unit tier runs. +# +# Also restores `userPrefs.jsonc` from the session-backup sidecar if a prior +# run exited abnormally (belt to conftest.py's atexit suspenders). + +set -euo pipefail + +# cd to the script's directory so relative paths resolve consistently no +# matter where the user invoked from. +SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)" +cd "$SCRIPT_DIR" + +VENV_PY="$SCRIPT_DIR/.venv/bin/python" +if [[ ! -x $VENV_PY ]]; then + echo "error: $VENV_PY not found or not executable." >&2 + echo " Bootstrap the venv first:" >&2 + echo " cd $SCRIPT_DIR && python3 -m venv .venv && .venv/bin/pip install -e '.[test]'" >&2 + exit 2 +fi + +# Resolve firmware root the same way conftest.py does (this script sits in +# mcp-server/, firmware repo root is one level up). +FIRMWARE_ROOT="$(cd "$SCRIPT_DIR/.." && pwd)" +USERPREFS_PATH="$FIRMWARE_ROOT/userPrefs.jsonc" +USERPREFS_SIDECAR="$USERPREFS_PATH.mcp-session-bak" + +# ---------- Pre-flight: recover stale userPrefs.jsonc from prior crash ---- +# If conftest.py's atexit hook didn't fire (SIGKILL, kernel panic, OS +# restart), the sidecar is the ground truth. Self-heal before running so we +# don't bake the previous run's dirty state into this run's firmware. +if [[ -f $USERPREFS_SIDECAR ]]; then + echo "[pre-flight] found $USERPREFS_SIDECAR from a prior abnormal exit;" >&2 + echo " restoring userPrefs.jsonc before starting." >&2 + cp "$USERPREFS_SIDECAR" "$USERPREFS_PATH" + rm -f "$USERPREFS_SIDECAR" +fi + +# If userPrefs.jsonc has uncommitted changes BEFORE the run starts, that's +# worth warning about — tests will snapshot this dirty state and restore to +# it at the end, which may not be what the operator wants. +if command -v git >/dev/null 2>&1; then + cd "$FIRMWARE_ROOT" + # Capture the git status into a local first — SC2312 flags command + # substitution inside `[[ -n ... ]]` because the exit code of `git + # status` is masked. A two-step assignment makes the failure path + # explicit (non-git, missing file) and keeps the bracket test clean. + _git_status_porcelain="$(git status --porcelain userPrefs.jsonc 2>/dev/null || true)" + if [[ -n $_git_status_porcelain ]]; then + echo "[pre-flight] warning: userPrefs.jsonc has uncommitted changes." >&2 + echo " Tests will snapshot THIS state and restore to it" >&2 + echo " at teardown. If that's not intended, run:" >&2 + echo " git checkout userPrefs.jsonc" >&2 + echo " and re-invoke." >&2 + fi + cd "$SCRIPT_DIR" +fi + +# ---------- Seed default -------------------------------------------------- +# Per-machine default so repeated runs from the same operator land on the +# same PSK (makes --assume-baked valid across invocations). Operator can +# override with an explicit env var if they want isolation (e.g. CI). +if [[ -z ${MESHTASTIC_MCP_SEED-} ]]; then + WHO="$(whoami 2>/dev/null || echo anon)" + HOST="$(hostname -s 2>/dev/null || echo host)" + export MESHTASTIC_MCP_SEED="mcp-${WHO}-${HOST}" +fi + +# ---------- Flash progress log -------------------------------------------- +# pio.py / hw_tools.py tee subprocess output (pio run -t upload, esptool, +# nrfutil, picotool) to this file line-by-line as it arrives when this env +# var is set. The TUI tails it so the operator sees live flash progress +# instead of 3 minutes of silence during `test_00_bake.py`. Plain CLI users +# also benefit — the log is a post-run diagnostic even without the TUI. +# Truncate at session start so each run gets a clean log. +export MESHTASTIC_MCP_FLASH_LOG="$SCRIPT_DIR/tests/flash.log" +: >"$MESHTASTIC_MCP_FLASH_LOG" + +# ---------- Detect connected hardware ------------------------------------- +# In-process call to the same Python API the test fixtures use, so the +# script never drifts from what pytest sees. Returns a JSON object +# {role: port, ...}. +ROLES_JSON="$( + "$VENV_PY" - <<'PY' +import json +import sys + +sys.path.insert(0, "src") +from meshtastic_mcp import devices + +# Role → canonical VID map. Kept in sync with +# `tests/conftest.py::hub_profile` defaults; if that changes, this must too. +ROLE_BY_VID = { + 0x239A: "nrf52", # Adafruit / RAK nRF52 native USB (app + DFU) + 0x303A: "esp32s3", # Espressif native USB (ESP32-S3) + 0x10C4: "esp32s3", # CP2102 USB-UART (common on Heltec/LilyGO ESP32 boards) +} + +out: dict[str, str] = {} +for dev in devices.list_devices(include_unknown=True): + vid_raw = dev.get("vid") or "" + try: + if isinstance(vid_raw, str) and vid_raw.startswith("0x"): + vid = int(vid_raw, 16) + else: + vid = int(vid_raw) + except (TypeError, ValueError): + continue + role = ROLE_BY_VID.get(vid) + # First port wins per role — matches hub_devices fixture semantics. + if role and role not in out: + out[role] = dev["port"] + +json.dump(out, sys.stdout) +PY +)" + +# ---------- Map role → pio env -------------------------------------------- +# Honor MESHTASTIC_MCP_ENV_ operator overrides; fall back to the +# same defaults hardcoded in tests/conftest.py::_DEFAULT_ROLE_ENVS. +resolve_env() { + local role="$1" + local default="$2" + local upper + upper="$(echo "$role" | tr '[:lower:]' '[:upper:]')" + local var="MESHTASTIC_MCP_ENV_${upper}" + eval "local override=\${$var:-}" + if [[ -n $override ]]; then + echo "$override" + else + echo "$default" + fi +} + +NRF52_PORT="$(echo "$ROLES_JSON" | "$VENV_PY" -c 'import json,sys; print(json.loads(sys.stdin.read()).get("nrf52", ""))')" +ESP32S3_PORT="$(echo "$ROLES_JSON" | "$VENV_PY" -c 'import json,sys; print(json.loads(sys.stdin.read()).get("esp32s3", ""))')" + +DETECTED="" +if [[ -n $NRF52_PORT ]]; then + NRF52_ENV="$(resolve_env nrf52 rak4631)" + export MESHTASTIC_MCP_ENV_NRF52="$NRF52_ENV" + DETECTED="${DETECTED} nrf52 @ ${NRF52_PORT} -> env=${NRF52_ENV}\n" +fi +if [[ -n $ESP32S3_PORT ]]; then + ESP32S3_ENV="$(resolve_env esp32s3 heltec-v3)" + export MESHTASTIC_MCP_ENV_ESP32S3="$ESP32S3_ENV" + DETECTED="${DETECTED} esp32s3 @ ${ESP32S3_PORT} -> env=${ESP32S3_ENV}\n" +fi + +# ---------- Pre-flight summary -------------------------------------------- +# Surface what pytest is about to do with respect to the bake phase: the +# operator should see "will verify + bake if needed" by default, so a +# 3-minute flash appearing mid-run isn't a surprise. Detection of the +# explicit overrides is best-effort — we just scan $@ for the known flags. +_bake_mode="auto (verify + bake if needed)" +for _arg in "$@"; do + case "$_arg" in + --assume-baked) _bake_mode="skip (--assume-baked)" ;; + --force-bake) _bake_mode="force (--force-bake)" ;; + *) ;; # any other arg: pass-through; bake mode unchanged + esac +done + +echo "mcp-server test runner" +echo " firmware root : $FIRMWARE_ROOT" +echo " seed : $MESHTASTIC_MCP_SEED" +echo " bake : $_bake_mode" +if [[ -n $DETECTED ]]; then + echo " detected hub :" + printf "%b" "$DETECTED" +else + echo " detected hub : (none)" +fi +echo + +# ---------- Invoke pytest ------------------------------------------------- +# If no devices detected, only the unit tier would produce meaningful +# PASS/FAIL — every hardware test would SKIP with "role not present". We +# narrow to tests/unit explicitly so the summary reads as "no hardware, +# unit suite only" instead of "big skip count looks suspicious". +if [[ -z $DETECTED && $# -eq 0 ]]; then + echo "[pre-flight] no supported devices detected; running unit tier only." + echo + exec "$VENV_PY" -m pytest tests/unit -v --report-log=tests/reportlog.jsonl +fi + +# Default pytest args when the user passed none. Power users can invoke +# `./run-tests.sh tests/mesh -v --tb=long` and skip all of these defaults. +# +# NOTE: `--assume-baked` is DELIBERATELY omitted here. `tests/test_00_bake.py` +# has an internal skip-if-already-baked check (`_bake_role`: query device_info, +# compare region + primary_channel to the session profile, skip on match). +# So the fast path is ~8-10 s of verification overhead when the devices are +# already baked — negligible next to the 2-6 min suite runtime. Letting +# test_00_bake.py run means a fresh device, a re-seeded session, or a post- +# factory-reset device gets flashed automatically instead of silently +# skipping half the hardware tests with "not baked with session profile" +# errors. Power users who know their hardware is current and want to shave +# those seconds can pass `--assume-baked` explicitly. +if [[ $# -eq 0 ]]; then + set -- tests/ \ + --html=tests/report.html --self-contained-html \ + --junitxml=tests/junit.xml \ + -v --tb=short +fi + +# Always emit `tests/reportlog.jsonl` (unless the operator explicitly passed +# their own `--report-log=...`). Consumers — notably the +# `meshtastic-mcp-test-tui` TUI — tail the reportlog for live per-test state. +# Appending here means power-user invocations like `./run-tests.sh tests/mesh` +# also produce it, not just the all-defaults invocation. +_has_report_log=0 +for _arg in "$@"; do + case "$_arg" in + --report-log | --report-log=*) _has_report_log=1 ;; + *) ;; # any other arg: no-op; loop continues + esac +done +if [[ $_has_report_log -eq 0 ]]; then + set -- "$@" --report-log=tests/reportlog.jsonl +fi + +exec "$VENV_PY" -m pytest "$@" diff --git a/mcp-server/src/meshtastic_mcp/__init__.py b/mcp-server/src/meshtastic_mcp/__init__.py new file mode 100644 index 00000000000..bd696afe01d --- /dev/null +++ b/mcp-server/src/meshtastic_mcp/__init__.py @@ -0,0 +1,3 @@ +"""Meshtastic MCP server — device discovery, PlatformIO tooling, and device admin.""" + +__version__ = "0.1.0" diff --git a/mcp-server/src/meshtastic_mcp/__main__.py b/mcp-server/src/meshtastic_mcp/__main__.py new file mode 100644 index 00000000000..4ed67db3821 --- /dev/null +++ b/mcp-server/src/meshtastic_mcp/__main__.py @@ -0,0 +1,11 @@ +"""Entry point for `python -m meshtastic_mcp`.""" + +from meshtastic_mcp.server import app + + +def main() -> None: + app.run() + + +if __name__ == "__main__": + main() diff --git a/mcp-server/src/meshtastic_mcp/admin.py b/mcp-server/src/meshtastic_mcp/admin.py new file mode 100644 index 00000000000..6da92d860a4 --- /dev/null +++ b/mcp-server/src/meshtastic_mcp/admin.py @@ -0,0 +1,377 @@ +"""Device administration: owner, config, channels, messaging, admin actions. + +All operations use the same `connect()` context manager so port selection, +port-busy detection, and cleanup are handled uniformly. + +Config writes use a dot-path: the first segment names a section (e.g. +`"lora"` in LocalConfig or `"mqtt"` in LocalModuleConfig), remaining segments +walk protobuf fields. Enum fields accept their string names (`"US"` for +`lora.region`) so callers don't need to know the numeric values. +""" + +from __future__ import annotations + +from typing import Any + +from google.protobuf import descriptor as pb_descriptor +from google.protobuf import json_format +from meshtastic.protobuf import localonly_pb2 + +from .connection import connect + + +class AdminError(RuntimeError): + pass + + +LOCAL_CONFIG_SECTIONS = {f.name for f in localonly_pb2.LocalConfig.DESCRIPTOR.fields} +MODULE_CONFIG_SECTIONS = { + f.name for f in localonly_pb2.LocalModuleConfig.DESCRIPTOR.fields +} + + +def _require_confirm(confirm: bool, operation: str) -> None: + if not confirm: + raise AdminError(f"{operation} is destructive and requires confirm=True.") + + +def _message_to_dict(msg: Any) -> dict[str, Any]: + # `including_default_value_fields` was renamed to + # `always_print_fields_with_no_presence` in protobuf 5.26+. Pick whichever + # kwarg the installed version accepts so we work against both. + kwargs: dict[str, Any] = {"preserving_proto_field_name": True} + import inspect + + sig = inspect.signature(json_format.MessageToDict) + if "always_print_fields_with_no_presence" in sig.parameters: + kwargs["always_print_fields_with_no_presence"] = False + elif "including_default_value_fields" in sig.parameters: + kwargs["including_default_value_fields"] = False + return json_format.MessageToDict(msg, **kwargs) + + +# ---------- owner ---------------------------------------------------------- + + +def set_owner( + long_name: str, + short_name: str | None = None, + port: str | None = None, +) -> dict[str, Any]: + if short_name is not None and len(short_name) > 4: + raise AdminError("short_name must be 4 characters or fewer") + with connect(port=port) as iface: + iface.localNode.setOwner(long_name=long_name, short_name=short_name) + return { + "ok": True, + "long_name": long_name, + "short_name": short_name, + } + + +# ---------- config reads --------------------------------------------------- + + +def _section_container(node, section: str) -> tuple[Any, str]: + """Return (container_message, parent_name) for a section name. + + Parent is 'localConfig' or 'moduleConfig' so callers know where to call + writeConfig() after mutating. + """ + if section in LOCAL_CONFIG_SECTIONS: + return getattr(node.localConfig, section), "localConfig" + if section in MODULE_CONFIG_SECTIONS: + return getattr(node.moduleConfig, section), "moduleConfig" + raise AdminError( + f"Unknown config section: {section!r}. " + f"Valid sections: {sorted(LOCAL_CONFIG_SECTIONS | MODULE_CONFIG_SECTIONS)}" + ) + + +def get_config(section: str | None = None, port: str | None = None) -> dict[str, Any]: + """Read one or all config sections. + + `section` may be any name in LocalConfig (device, lora, position, power, + network, display, bluetooth, security) or LocalModuleConfig (mqtt, serial, + telemetry, ...). Omit `section` or pass `"all"` for everything. + """ + with connect(port=port) as iface: + node = iface.localNode + if section in (None, "all"): + lc = _message_to_dict(node.localConfig) + mc = _message_to_dict(node.moduleConfig) + return { + "config": { + "localConfig": lc, + "moduleConfig": mc, + } + } + container, _parent = _section_container(node, section) + return {"config": {section: _message_to_dict(container)}} + + +# ---------- config writes -------------------------------------------------- + + +def _coerce_enum(field: pb_descriptor.FieldDescriptor, value: Any) -> int: + """Accept an enum value as either its int or its string name.""" + enum_type = field.enum_type + if isinstance(value, bool): + raise AdminError(f"{field.name}: expected enum {enum_type.name}, got bool") + if isinstance(value, int): + if enum_type.values_by_number.get(value) is None: + raise AdminError( + f"{field.name}: {value} is not a valid {enum_type.name} value" + ) + return value + if isinstance(value, str): + upper = value.upper() + ev = enum_type.values_by_name.get(upper) + if ev is None: + valid = sorted(enum_type.values_by_name.keys()) + raise AdminError( + f"{field.name}: {value!r} is not a valid {enum_type.name}. " + f"Valid: {valid}" + ) + return ev.number + raise AdminError( + f"{field.name}: expected enum {enum_type.name}, got {type(value).__name__}" + ) + + +def _coerce_scalar(field: pb_descriptor.FieldDescriptor, value: Any) -> Any: + t = field.type + FT = pb_descriptor.FieldDescriptor + if t == FT.TYPE_ENUM: + return _coerce_enum(field, value) + if t == FT.TYPE_BOOL: + if isinstance(value, bool): + return value + if isinstance(value, str): + return value.strip().lower() in ("true", "yes", "1", "on") + if isinstance(value, int): + return bool(value) + if t in ( + FT.TYPE_INT32, + FT.TYPE_INT64, + FT.TYPE_UINT32, + FT.TYPE_UINT64, + FT.TYPE_SINT32, + FT.TYPE_SINT64, + FT.TYPE_FIXED32, + FT.TYPE_FIXED64, + ): + return int(value) + if t in (FT.TYPE_FLOAT, FT.TYPE_DOUBLE): + return float(value) + if t == FT.TYPE_STRING: + return str(value) + if t == FT.TYPE_BYTES: + if isinstance(value, (bytes, bytearray)): + return bytes(value) + return str(value).encode("utf-8") + raise AdminError( + f"{field.name}: unsupported field type {t}. Use raw protobuf for this field." + ) + + +def _walk_to_field( + root_msg: Any, path_segments: list[str] +) -> tuple[Any, pb_descriptor.FieldDescriptor]: + """Walk `root_msg` by field names until the leaf; return (parent_msg, leaf_field_descriptor).""" + msg = root_msg + for i, name in enumerate(path_segments): + desc = msg.DESCRIPTOR + field = desc.fields_by_name.get(name) + if field is None: + trail = ".".join(path_segments[:i] or [""]) + valid = [f.name for f in desc.fields] + raise AdminError(f"No field {name!r} in {trail}. Valid: {valid}") + is_last = i == len(path_segments) - 1 + if is_last: + return msg, field + if field.type != pb_descriptor.FieldDescriptor.TYPE_MESSAGE: + raise AdminError( + f"{'.'.join(path_segments[:i+1])} is a scalar; cannot descend into it" + ) + msg = getattr(msg, name) + # path_segments was empty + raise AdminError("Empty config path") + + +def set_config(path: str, value: Any, port: str | None = None) -> dict[str, Any]: + """Set a single config field by dot-path and write it to the device. + + Examples: + set_config("lora.region", "US") + set_config("lora.modem_preset", "LONG_FAST") + set_config("device.role", "ROUTER") + set_config("mqtt.enabled", True) + set_config("mqtt.address", "mqtt.example.com") + + """ + segments = [s for s in path.split(".") if s] + if not segments: + raise AdminError("path cannot be empty") + section = segments[0] + + with connect(port=port) as iface: + node = iface.localNode + container, parent_name = _section_container(node, section) + + # Treat the section as the root; the rest of the path walks into it. + leaf_parent, field = _walk_to_field(container, segments[1:] or []) + # Use `is_repeated` (modern upb protobuf API) rather than the + # deprecated `label == LABEL_REPEATED` check — the C-extension + # FieldDescriptor in protobuf >= 5.x doesn't expose `.label` at + # all, and `is_repeated` is the supported replacement that works + # across both the pure-python and upb backends. + if field.is_repeated: + raise AdminError( + f"{path!r} is a repeated field; v1 only supports scalar sets. " + "Use the raw meshtastic CLI for now." + ) + old_raw = getattr(leaf_parent, field.name) + coerced = _coerce_scalar(field, value) + try: + setattr(leaf_parent, field.name, coerced) + except (TypeError, ValueError) as exc: + raise AdminError(f"{path}: {exc}") from exc + + node.writeConfig(section) + + # Stringify enums for the response (so the caller can see the change in + # the same vocabulary they used to set it). + if field.type == pb_descriptor.FieldDescriptor.TYPE_ENUM: + try: + old_display = field.enum_type.values_by_number[old_raw].name + new_display = field.enum_type.values_by_number[coerced].name + except Exception: + old_display, new_display = old_raw, coerced + else: + old_display, new_display = old_raw, coerced + + return { + "ok": True, + "path": path, + "section": section, + "parent": parent_name, + "old_value": old_display, + "new_value": new_display, + } + + +# ---------- channels ------------------------------------------------------- + + +def get_channel_url( + include_all: bool = False, port: str | None = None +) -> dict[str, Any]: + with connect(port=port) as iface: + url = iface.localNode.getURL(includeAll=include_all) + return {"url": url} + + +def set_channel_url(url: str, port: str | None = None) -> dict[str, Any]: + with connect(port=port) as iface: + # setURL replaces the channel set from the URL's contents. It does not + # return a count; we infer by counting non-DISABLED channels after. + iface.localNode.setURL(url) + channels = iface.localNode.channels or [] + active = sum(1 for c in channels if getattr(c, "role", 0) != 0) + return {"ok": True, "channels_imported": active} + + +# ---------- messaging ------------------------------------------------------ + + +def send_text( + text: str, + to: str | int | None = None, + channel_index: int = 0, + want_ack: bool = False, + port: str | None = None, +) -> dict[str, Any]: + destination = to if to is not None else "^all" + with connect(port=port) as iface: + packet = iface.sendText( + text, + destinationId=destination, + wantAck=want_ack, + channelIndex=channel_index, + ) + packet_id = getattr(packet, "id", None) + return {"ok": True, "packet_id": packet_id, "destination": destination} + + +# ---------- diagnostics ---------------------------------------------------- + + +def set_debug_log_api(enabled: bool, port: str | None = None) -> dict[str, Any]: + """Toggle `config.security.debug_log_api_enabled` on the local node. + + When enabled, firmware emits log lines as protobuf `LogRecord` messages + over the StreamAPI instead of raw text. meshtastic-python surfaces them + on pubsub topic `meshtastic.log.line`, which flows through the SAME + SerialInterface our tests already hold open — no `pio device monitor` + needed, no port-contention with admin/info calls. + + Firmware gate: `src/SerialConsole.cpp` (`usingProtobufs && + config.security.debug_log_api_enabled`). Setting persists in NVS; it + survives reboot. `factory_reset(full=False)` clears it unless it's + re-applied after reset. + + Previously-documented concurrency hazard (emitLogRecord sharing the + main packet-emission buffers) has been fixed — see `StreamAPI.h` + where the log path now owns dedicated `fromRadioScratchLog` / + `txBufLog` buffers, and `StreamAPI::emitTxBuffer` + + `StreamAPI::emitLogRecord` both serialize their `stream->write` + calls via `streamLock`. Leaving the flag on under traffic is safe. + """ + with connect(port=port) as iface: + sec = iface.localNode.localConfig.security + sec.debug_log_api_enabled = bool(enabled) + iface.localNode.writeConfig("security") + return {"ok": True, "debug_log_api_enabled": bool(enabled)} + + +# ---------- admin actions -------------------------------------------------- + + +def reboot( + port: str | None = None, confirm: bool = False, seconds: int = 10 +) -> dict[str, Any]: + _require_confirm(confirm, "reboot") + with connect(port=port) as iface: + iface.localNode.reboot(secs=seconds) + return {"ok": True, "rebooting_in_s": seconds} + + +def shutdown( + port: str | None = None, confirm: bool = False, seconds: int = 10 +) -> dict[str, Any]: + _require_confirm(confirm, "shutdown") + with connect(port=port) as iface: + iface.localNode.shutdown(secs=seconds) + return {"ok": True, "shutting_down_in_s": seconds} + + +def factory_reset( + port: str | None = None, confirm: bool = False, full: bool = False +) -> dict[str, Any]: + """Tell the node to factory-reset its config. + + Works around a meshtastic-python 2.7.8 bug: `Node.factoryReset(full=True)` + internally does `p.factory_reset_config = True` where the field is + int32. protobuf 5.x rejects bool→int assignment as a TypeError. We build + the AdminMessage directly with int values (1=non-full, 2=full) and call + `_sendAdmin` to sidestep the SDK bug entirely. + """ + _require_confirm(confirm, "factory_reset") + from meshtastic.protobuf import admin_pb2 # type: ignore[import-untyped] + + with connect(port=port) as iface: + msg = admin_pb2.AdminMessage() + msg.factory_reset_config = 2 if full else 1 + iface.localNode._sendAdmin(msg) + return {"ok": True, "full": full} diff --git a/mcp-server/src/meshtastic_mcp/boards.py b/mcp-server/src/meshtastic_mcp/boards.py new file mode 100644 index 00000000000..df5024800a6 --- /dev/null +++ b/mcp-server/src/meshtastic_mcp/boards.py @@ -0,0 +1,159 @@ +"""Board / PlatformIO env enumeration. + +Parses `pio project config --json-output` — a nested list of +`[section_name, [[key, value], ...]]` pairs — into a dict keyed by env name, +extracting the `custom_meshtastic_*` metadata the firmware variants expose. + +The parsed config is cached and invalidated when `platformio.ini`'s mtime +changes, so subsequent calls don't pay the 1–2s pio startup cost. +""" + +from __future__ import annotations + +import threading +from typing import Any + +from . import config, pio + +_CACHE_LOCK = threading.Lock() +_CACHE: dict[str, Any] = {"mtime": None, "envs": None} + + +def _parse_bool(value: Any) -> bool: + if isinstance(value, bool): + return value + if isinstance(value, str): + return value.strip().lower() in ("true", "yes", "1", "on") + return bool(value) + + +def _parse_int(value: Any) -> int | None: + try: + return int(value) + except (TypeError, ValueError): + return None + + +def _parse_tags(value: Any) -> list[str]: + if value is None: + return [] + if isinstance(value, list): + return [str(v).strip() for v in value if str(v).strip()] + return [t.strip() for t in str(value).replace(",", " ").split() if t.strip()] + + +def _env_record(env_name: str, items: list[list[Any]]) -> dict[str, Any]: + """Build a normalized dict for one env section.""" + d = dict(items) + return { + "env": env_name, + "architecture": d.get("custom_meshtastic_architecture"), + "hw_model": _parse_int(d.get("custom_meshtastic_hw_model")), + "hw_model_slug": d.get("custom_meshtastic_hw_model_slug"), + "display_name": d.get("custom_meshtastic_display_name"), + "actively_supported": _parse_bool( + d.get("custom_meshtastic_actively_supported") + ), + "support_level": _parse_int(d.get("custom_meshtastic_support_level")), + "board_level": d.get("board_level"), # "pr", "extra", or None + "tags": _parse_tags(d.get("custom_meshtastic_tags")), + "images": _parse_tags(d.get("custom_meshtastic_images")), + "board": d.get("board"), + "upload_speed": _parse_int(d.get("upload_speed")), + "upload_protocol": d.get("upload_protocol"), + "monitor_speed": _parse_int(d.get("monitor_speed")), + "monitor_filters": d.get("monitor_filters") or [], + "_raw": d, # Full dict for get_board + } + + +def _load_all() -> dict[str, dict[str, Any]]: + """Parse `pio project config` into `{env_name: record}`.""" + raw = pio.run_json(["project", "config"], timeout=pio.TIMEOUT_PROJECT_CONFIG) + result: dict[str, dict[str, Any]] = {} + for section_name, items in raw: + if not isinstance(section_name, str) or not section_name.startswith("env:"): + continue + env_name = section_name.split(":", 1)[1] + result[env_name] = _env_record(env_name, items) + return result + + +def _get_cached() -> dict[str, dict[str, Any]]: + root = config.firmware_root() + platformio_ini = root / "platformio.ini" + try: + mtime = platformio_ini.stat().st_mtime + except FileNotFoundError: + mtime = None + + with _CACHE_LOCK: + if _CACHE["envs"] is not None and _CACHE["mtime"] == mtime: + return _CACHE["envs"] + envs = _load_all() + _CACHE["envs"] = envs + _CACHE["mtime"] = mtime + return envs + + +def invalidate_cache() -> None: + with _CACHE_LOCK: + _CACHE["envs"] = None + _CACHE["mtime"] = None + + +def _public_record(rec: dict[str, Any]) -> dict[str, Any]: + """Strip the `_raw` field for list outputs.""" + return {k: v for k, v in rec.items() if not k.startswith("_")} + + +def list_boards( + architecture: str | None = None, + actively_supported_only: bool = False, + query: str | None = None, + board_level: str | None = None, # "release" | "pr" | "extra" +) -> list[dict[str, Any]]: + """Enumerate PlatformIO envs with Meshtastic metadata. + + Filters are cumulative (AND). `board_level="release"` means envs with no + explicit `board_level` set (the default release targets). + """ + envs = _get_cached() + q = query.lower().strip() if query else None + + out = [] + for rec in envs.values(): + if architecture and rec.get("architecture") != architecture: + continue + if actively_supported_only and not rec.get("actively_supported"): + continue + if board_level is not None: + rec_level = rec.get("board_level") + if board_level == "release": + if rec_level not in (None, ""): + continue + elif rec_level != board_level: + continue + if q: + display = (rec.get("display_name") or "").lower() + env_name = rec.get("env", "").lower() + slug = (rec.get("hw_model_slug") or "").lower() + if q not in display and q not in env_name and q not in slug: + continue + out.append(_public_record(rec)) + + out.sort(key=lambda r: (r.get("architecture") or "", r.get("env"))) + return out + + +def get_board(env: str) -> dict[str, Any]: + """Full metadata for one env, including the raw pio config dict.""" + envs = _get_cached() + rec = envs.get(env) + if rec is None: + raise KeyError( + f"Unknown env: {env!r}. Use list_boards() to see available envs." + ) + public = _public_record(rec) + public["raw_config"] = rec["_raw"] + return public diff --git a/mcp-server/src/meshtastic_mcp/cli/__init__.py b/mcp-server/src/meshtastic_mcp/cli/__init__.py new file mode 100644 index 00000000000..04729b643e1 --- /dev/null +++ b/mcp-server/src/meshtastic_mcp/cli/__init__.py @@ -0,0 +1,6 @@ +"""Command-line entry points that sit alongside the MCP server. + +Modules here are loaded on-demand by `[project.scripts]` entries in +`pyproject.toml`. They are NOT imported by `meshtastic_mcp.server` or the +admin/info tool surface — the MCP server stays pure stdio JSON-RPC. +""" diff --git a/mcp-server/src/meshtastic_mcp/cli/_flashlog.py b/mcp-server/src/meshtastic_mcp/cli/_flashlog.py new file mode 100644 index 00000000000..889183bb30e --- /dev/null +++ b/mcp-server/src/meshtastic_mcp/cli/_flashlog.py @@ -0,0 +1,73 @@ +"""Flash progress log tailer for ``meshtastic-mcp-test-tui``. + +``pio.py`` / ``hw_tools.py`` tee subprocess output (``pio run -t upload``, +``esptool erase_flash``, ``nrfutil dfu``, etc.) to ``tests/flash.log`` +line-by-line as it arrives — controlled by the ``MESHTASTIC_MCP_FLASH_LOG`` +env var that ``run-tests.sh`` sets. The TUI tails that file so the operator +sees live flash progress in the pytest pane instead of 3 minutes of silence +during ``test_00_bake``. + +Separate from ``_fwlog.py`` because that one parses JSONL, this one +streams plain text lines. Same daemon-thread + EOF-backoff structure. +""" + +from __future__ import annotations + +import pathlib +import threading +import time +from typing import Callable + + +class FlashLogTailer(threading.Thread): + """Tail a plain-text log file, publish each stripped line via ``post``. + + ``post`` is invoked with a single ``str`` for every new line. Lines are + stripped of trailing newlines; empty lines after stripping are dropped. + + The file may not exist yet when this thread starts — it's truncated by + ``run-tests.sh`` at session start, but if the tailer races the shell, + we tolerate FileNotFoundError for up to ``wait_s`` seconds. + """ + + def __init__( + self, + path: pathlib.Path, + post: Callable[[str], None], + stop: threading.Event, + *, + wait_s: float = 30.0, + ) -> None: + super().__init__(daemon=True, name="flashlog-tail") + self._path = path + self._post = post + self._stop = stop + self._wait_s = wait_s + + def run(self) -> None: + deadline = time.monotonic() + self._wait_s + while not self._path.is_file(): + if self._stop.is_set() or time.monotonic() > deadline: + return + time.sleep(0.1) + try: + fh = self._path.open("r", encoding="utf-8", errors="replace") + except OSError: + return + try: + while not self._stop.is_set(): + line = fh.readline() + if not line: + time.sleep(0.05) + continue + line = line.rstrip("\r\n") + if not line: + continue + try: + self._post(line) + except Exception: + # A post failure (e.g. closed app) is terminal for this + # thread but we still want to close the file handle. + return + finally: + fh.close() diff --git a/mcp-server/src/meshtastic_mcp/cli/_fwlog.py b/mcp-server/src/meshtastic_mcp/cli/_fwlog.py new file mode 100644 index 00000000000..7db20f81cc8 --- /dev/null +++ b/mcp-server/src/meshtastic_mcp/cli/_fwlog.py @@ -0,0 +1,96 @@ +"""Firmware log tail worker for ``meshtastic-mcp-test-tui``. + +Complements v1's reportlog-tail worker. ``tests/conftest.py`` owns a +session-scoped autouse fixture (``_firmware_log_stream``) that mirrors +every ``meshtastic.log.line`` pubsub event to ``tests/fwlog.jsonl`` — +one JSON object per line: + + {"ts": 1729100000.123, "port": "/dev/cu.usbmodem1101", "line": "..."} + +The TUI tails that file from a worker thread; each new line becomes a +:class:`FirmwareLogLine` message posted to the App. Same pattern as the +reportlog tail worker — truncate on launch, tolerate missing file for +30 s, back off at EOF. + +Kept in its own module so the (large) ``test_tui.py`` stays focused on +the Textual App shell. +""" + +from __future__ import annotations + +import json +import pathlib +import threading +import time +from typing import Any, Callable + + +class FirmwareLogTailer(threading.Thread): + """Tail ``tests/fwlog.jsonl``, publish parsed records via ``post``. + + ``post`` is the App's ``post_message`` (or any callable that accepts a + single payload arg). We pass parsed dicts rather than constructing + Textual Message objects here — keeps this module free of the + textual dependency so it's unit-testable in a bare venv. + + Parameters + ---------- + path: + Path to ``tests/fwlog.jsonl``. The file may not exist yet at + startup — pytest only creates it once the session fixture runs. + post: + Callable invoked with a dict ``{"ts", "port", "line"}`` for every + new line parsed from the file. + stop: + An event the App sets to signal shutdown. + wait_s: + How long to poll for the file's creation before giving up. Default + 30 s; pytest collection on a cold cache can be slow. + + """ + + def __init__( + self, + path: pathlib.Path, + post: Callable[[dict[str, Any]], None], + stop: threading.Event, + *, + wait_s: float = 30.0, + ) -> None: + super().__init__(daemon=True, name="fwlog-tail") + self._path = path + self._post = post + self._stop = stop + self._wait_s = wait_s + + def run(self) -> None: + deadline = time.monotonic() + self._wait_s + while not self._path.is_file(): + if self._stop.is_set() or time.monotonic() > deadline: + return + time.sleep(0.1) + try: + fh = self._path.open("r", encoding="utf-8") + except OSError: + return + try: + while not self._stop.is_set(): + line = fh.readline() + if not line: + time.sleep(0.05) + continue + line = line.strip() + if not line: + continue + try: + record = json.loads(line) + except json.JSONDecodeError: + continue + # Defensive: require the three fields we rely on. + if not isinstance(record, dict): + continue + if "line" not in record: + continue + self._post(record) + finally: + fh.close() diff --git a/mcp-server/src/meshtastic_mcp/cli/_history.py b/mcp-server/src/meshtastic_mcp/cli/_history.py new file mode 100644 index 00000000000..639dcec5f55 --- /dev/null +++ b/mcp-server/src/meshtastic_mcp/cli/_history.py @@ -0,0 +1,127 @@ +"""Cross-run history for ``meshtastic-mcp-test-tui``. + +Persists one JSON object per pytest run to +``mcp-server/tests/.history/runs.jsonl``. The TUI reads the last N +entries on launch to render a duration sparkline in the header — a +quick read on whether the suite is slowing down over time. + +Schema (keep small; the file can grow for months): + + {"run": 42, "ts": 1729100000.0, "duration_s": 387.2, + "passed": 52, "failed": 0, "skipped": 23, "exit_code": 0, + "seed": "mcp-user-host"} +""" + +from __future__ import annotations + +import json +import pathlib +import time +from dataclasses import asdict, dataclass +from typing import Iterable + +# Sparkline glyphs, low → high. 8 levels is the Unicode convention. +_SPARK_BLOCKS = "▁▂▃▄▅▆▇█" + + +@dataclass +class RunRecord: + run: int + ts: float + duration_s: float + passed: int + failed: int + skipped: int + exit_code: int + seed: str + + +class HistoryStore: + """Append-only JSONL store with bounded read. + + Writes are fsynced after each append (the file is tiny; fsync cost + is negligible and protects against truncation on a crash). + """ + + def __init__(self, path: pathlib.Path, *, keep_last: int = 50) -> None: + self._path = path + self._keep_last = keep_last + + def append(self, record: RunRecord) -> None: + try: + self._path.parent.mkdir(parents=True, exist_ok=True) + with self._path.open("a", encoding="utf-8") as fh: + fh.write(json.dumps(asdict(record)) + "\n") + fh.flush() + except Exception: + # Non-fatal: history is cosmetic. + pass + + def read_recent(self) -> list[RunRecord]: + """Return the last ``keep_last`` records in chronological order.""" + if not self._path.is_file(): + return [] + try: + lines = self._path.read_text(encoding="utf-8").splitlines() + except OSError: + return [] + out: list[RunRecord] = [] + # Parse tail-first so we don't waste work on a huge history. + for line in lines[-self._keep_last :]: + line = line.strip() + if not line: + continue + try: + raw = json.loads(line) + except json.JSONDecodeError: + continue + try: + out.append(RunRecord(**raw)) + except TypeError: + # Schema drift; skip the record rather than crash. + continue + return out + + def record_run( + self, + *, + run: int, + duration_s: float, + passed: int, + failed: int, + skipped: int, + exit_code: int, + seed: str, + ) -> RunRecord: + rec = RunRecord( + run=run, + ts=time.time(), + duration_s=float(duration_s), + passed=int(passed), + failed=int(failed), + skipped=int(skipped), + exit_code=int(exit_code), + seed=seed, + ) + self.append(rec) + return rec + + +def sparkline(values: Iterable[float], *, width: int = 20) -> str: + """Render a Unicode block-character sparkline from the last ``width`` values. + + Returns an empty string for empty input so the header handles + "no history yet" gracefully. + """ + buf = [v for v in values if v >= 0][-width:] + if not buf: + return "" + lo, hi = min(buf), max(buf) + if hi - lo < 1e-9: + return _SPARK_BLOCKS[len(_SPARK_BLOCKS) // 2] * len(buf) + n = len(_SPARK_BLOCKS) - 1 + out = [] + for v in buf: + idx = int(round((v - lo) / (hi - lo) * n)) + out.append(_SPARK_BLOCKS[max(0, min(n, idx))]) + return "".join(out) diff --git a/mcp-server/src/meshtastic_mcp/cli/_reproducer.py b/mcp-server/src/meshtastic_mcp/cli/_reproducer.py new file mode 100644 index 00000000000..420da3c76a7 --- /dev/null +++ b/mcp-server/src/meshtastic_mcp/cli/_reproducer.py @@ -0,0 +1,214 @@ +"""Reproducer bundle builder for ``meshtastic-mcp-test-tui``. + +When the operator presses ``x`` on a failed test leaf, we package the +minimum viable failure context into a tarball under +``mcp-server/tests/reproducers/``: + +:: + + repro--.tar.gz + ├── README.md human-readable overview + ├── test_report.json the failing TestReport event from reportlog + ├── fwlog.jsonl firmware log filtered to the failure window + ├── devices.json per-device device_info + lora config snapshot + └── env.json seed, run #, pytest version, platform, hostname + +Separate module so the logic can be unit-tested without Textual. The +TUI glue is thin — one key binding calls :func:`build_reproducer_bundle` +with the focused test's state and shows the path in a modal. +""" + +from __future__ import annotations + +import io +import json +import pathlib +import platform +import re +import socket +import tarfile +import time +from dataclasses import dataclass +from typing import Any, Iterable + + +@dataclass +class ReproContext: + """Everything :func:`build_reproducer_bundle` needs. Shaped to map + cleanly onto the state the TUI already tracks — no extra data + collection required at export time.""" + + nodeid: str + longrepr: str + sections: list[tuple[str, str]] + start_ts: float | None + stop_ts: float | None + seed: str + run_number: int + exit_code: int | None + fwlog_path: pathlib.Path + output_dir: pathlib.Path + extra_device_rows: list[dict[str, Any]] # [{role, port, info, ...}, ...] + + +def _short_nodeid(nodeid: str) -> str: + """Collapse a pytest nodeid into a filename-safe slug (<= 60 chars).""" + # Drop the file path prefix; keep test name + parametrization. + tail = nodeid.split("::", 1)[-1] if "::" in nodeid else nodeid + slug = re.sub(r"[^A-Za-z0-9_.\-]", "_", tail) + return slug[:60].strip("_.-") or "test" + + +def _filtered_fwlog( + fwlog_path: pathlib.Path, + start_ts: float | None, + stop_ts: float | None, + *, + pad_s: float = 5.0, +) -> bytes: + """Return fwlog.jsonl lines whose ``ts`` lies in [start-pad, stop+pad].""" + if not fwlog_path.is_file(): + return b"" + if start_ts is None or stop_ts is None: + # Without a time window, include the whole file — rare; happens + # when a test fails in setup before pytest emitted a start ts. + try: + return fwlog_path.read_bytes() + except OSError: + return b"" + lo, hi = start_ts - pad_s, stop_ts + pad_s + out = io.BytesIO() + try: + with fwlog_path.open("r", encoding="utf-8") as fh: + for line in fh: + stripped = line.strip() + if not stripped: + continue + try: + record = json.loads(stripped) + except json.JSONDecodeError: + continue + ts = record.get("ts") + if not isinstance(ts, (int, float)): + continue + if lo <= ts <= hi: + out.write(line.encode("utf-8")) + except OSError: + return b"" + return out.getvalue() + + +def _readme(ctx: ReproContext) -> str: + t = time.strftime("%Y-%m-%d %H:%M:%S %Z", time.localtime()) + return f"""# Reproducer bundle + +Exported by `meshtastic-mcp-test-tui` on {t}. + +## Failing test + +- **nodeid:** `{ctx.nodeid}` +- **seed:** `{ctx.seed}` +- **run #:** {ctx.run_number} +- **suite exit code (at export time):** {ctx.exit_code if ctx.exit_code is not None else "in progress"} + +## Files in this archive + +| File | Contents | +|---|---| +| `test_report.json` | The pytest-reportlog `TestReport` event for the failing test — includes `longrepr`, captured `sections` (stdout/stderr/log), `duration`, `location`, `keywords`. | +| `fwlog.jsonl` | Firmware log lines (from `meshtastic.log.line` pubsub) filtered to [start−5s, stop+5s] around the test's run window. Each line is `{{ts, port, line}}`. | +| `devices.json` | Per-device snapshot at export time: `device_info` + `lora` config per detected role. | +| `env.json` | Python version, platform, hostname, seed, run number. | + +## How to triage + +1. Open `test_report.json` and read `longrepr` + `sections` — most failures explain themselves there. +2. If the failure is a mesh/telemetry assertion, `fwlog.jsonl` is where the answer usually lives. Grep for `Error=`, `NAK`, `PKI_UNKNOWN_PUBKEY`, `Skip send`, `Guru Meditation`, or the uptime timestamps around the assertion event. +3. Compare `devices.json` against the expected state (e.g. `num_nodes >= 2`, `primary_channel == "McpTest"`, `region == "US"`). If fields disagree with the seed-derived USERPREFS profile, the device probably wasn't baked with this session's profile. + +## Reproducing locally + +```bash +cd mcp-server +MESHTASTIC_MCP_SEED='{ctx.seed}' .venv/bin/pytest '{ctx.nodeid}' --tb=long -v +``` +""" + + +def build_reproducer_bundle(ctx: ReproContext) -> pathlib.Path: + """Build a tarball under ``ctx.output_dir`` and return its path. + + Parent dirs are created as needed. Errors during optional sections + (devices, env) are swallowed — the bundle is still useful without + them; refusing to export because the device poller had a hiccup + would be worse than the export missing a file. + """ + ctx.output_dir.mkdir(parents=True, exist_ok=True) + ts = int(time.time()) + slug = _short_nodeid(ctx.nodeid) + archive_path = ctx.output_dir / f"repro-{ts}-{slug}.tar.gz" + + with tarfile.open(archive_path, "w:gz") as tar: + + def _add(name: str, data: bytes) -> None: + info = tarfile.TarInfo(name=name) + info.size = len(data) + info.mtime = ts + tar.addfile(info, io.BytesIO(data)) + + # README + _add("README.md", _readme(ctx).encode("utf-8")) + + # test_report.json — reconstruct from the fields the TUI stashes. + test_report = { + "nodeid": ctx.nodeid, + "outcome": "failed", + "longrepr": ctx.longrepr, + "sections": [list(s) for s in ctx.sections], + "start": ctx.start_ts, + "stop": ctx.stop_ts, + } + _add( + "test_report.json", + json.dumps(test_report, indent=2, default=str).encode("utf-8"), + ) + + # fwlog.jsonl (filtered) + _add("fwlog.jsonl", _filtered_fwlog(ctx.fwlog_path, ctx.start_ts, ctx.stop_ts)) + + # devices.json + try: + devices_payload = json.dumps( + ctx.extra_device_rows or [], indent=2, default=str + ) + except Exception: + devices_payload = "[]" + _add("devices.json", devices_payload.encode("utf-8")) + + # env.json + try: + from importlib.metadata import version as _pkg_version + + pytest_version = _pkg_version("pytest") + except Exception: + pytest_version = "unknown" + env_payload = { + "seed": ctx.seed, + "run": ctx.run_number, + "exit_code": ctx.exit_code, + "export_ts": ts, + "python": platform.python_version(), + "pytest": pytest_version, + "platform": f"{platform.system()} {platform.release()} {platform.machine()}", + "hostname": socket.gethostname(), + } + _add("env.json", json.dumps(env_payload, indent=2).encode("utf-8")) + + return archive_path + + +def iter_entries(archive_path: pathlib.Path) -> Iterable[str]: + """Yield member names — used by callers that want to confirm the bundle shape.""" + with tarfile.open(archive_path, "r:gz") as tar: + for m in tar.getmembers(): + yield m.name diff --git a/mcp-server/src/meshtastic_mcp/cli/test_tui.py b/mcp-server/src/meshtastic_mcp/cli/test_tui.py new file mode 100644 index 00000000000..33201101b1a --- /dev/null +++ b/mcp-server/src/meshtastic_mcp/cli/test_tui.py @@ -0,0 +1,1782 @@ +"""Textual TUI wrapping `mcp-server/run-tests.sh`. + +Launch: ``meshtastic-mcp-test-tui [pytest-args]`` + +The TUI *wraps* ``run-tests.sh``; it never replaces it. Same script, same +env-var resolution, same ``userPrefs.jsonc`` session fixture. Four data +sources drive live state: + +1. ``tests/reportlog.jsonl`` — written by ``pytest-reportlog``. Tailed in a + worker thread; each JSON line is published as a :class:`ReportLogEvent` + message. This is the authoritative source for tree population + per-test + outcome. +2. The pytest subprocess ``stdout`` + ``stderr`` streams — line-by-line, + published as :class:`PytestLine` messages and rendered verbatim in the + pytest pane. +3. ``tests/fwlog.jsonl`` — firmware log stream. Written by the + ``_firmware_log_stream`` autouse session fixture in ``conftest.py`` + (mirrors every ``meshtastic.log.line`` pubsub event), tailed by the + :class:`FirmwareLogTailer` worker, displayed in a wrap-enabled + RichLog with cycleable port filter. +4. ``devices.list_devices()`` + ``info.device_info(port)`` — polled only at + startup and again after ``RunFinished``. Device polling while pytest + holds a SerialInterface would deadlock on the exclusive port lock; the + existing ``hub_devices`` fixture is session-scoped so there is no safe + "between tests" window. The header reflects this with a "(stale)" + marker while the run is active. + +Key bindings (see :class:`TestTuiApp.BINDINGS`): + ``r`` re-run focused ``f`` filter tree ``d`` failure detail + ``g`` open report.html ``l`` cycle firmware-log port filter + ``x`` export reproducer bundle ``c`` tool-coverage panel + ``q`` / Ctrl-C graceful quit with SIGINT → SIGTERM → SIGKILL escalation + +Shipped today (v1 + v2 slice): test tree + tier counters with progress bars, +pytest tail, live firmware log with port filter, device strip with +"currently running" status column, failure-detail modal, reproducer bundle +export (filters fwlog by test's start/stop timestamps), tool-coverage +modal, cross-run history sparkline in the header, clean SIGINT +propagation. Still open (see the plan file): mesh topology mini-diagram +and airtime / channel-utilization gauges. +""" + +from __future__ import annotations + +import argparse +import json +import os +import pathlib +import signal +import subprocess +import sys +import threading +import time +from dataclasses import dataclass, field +from typing import Any, Iterator + +# --------------------------------------------------------------------------- +# Configuration constants +# --------------------------------------------------------------------------- + +# Tier names that map nodeids like "tests//..." to counter buckets. +# Order here == display order in the tier-counters table. Matches the order +# `pytest_collection_modifyitems` in `conftest.py` uses: +# bake → unit → mesh → telemetry → monitor → fleet → admin → provisioning +# so the counters table reads top-to-bottom in execution order. +# +# "bake" is the synthetic tier for `tests/test_00_bake.py` — the file sits +# at the `tests/` root rather than under a tier subdirectory, so without +# this mapping `_tier_of_nodeid` would return "other" and the bake outcomes +# would be silently dropped from both the tier table and the history +# record (which sums tier counters to compute passed/failed/skipped). +TIERS = ( + "bake", + "unit", + "mesh", + "telemetry", + "monitor", + "fleet", + "admin", + "provisioning", +) + +# Relative paths from the mcp-server root. +_REPORTLOG_RELATIVE = "tests/reportlog.jsonl" +_FWLOG_RELATIVE = "tests/fwlog.jsonl" +# pio / esptool / nrfutil / picotool tee subprocess output here when +# `MESHTASTIC_MCP_FLASH_LOG` is set (see `pio._run_capturing`). run-tests.sh +# sets that env var; the TUI also sets it for direct `_spawn_pytest` calls +# so `r`-key re-runs that skip the wrapper still get tee'd output. +_FLASHLOG_RELATIVE = "tests/flash.log" +_REPORT_HTML_RELATIVE = "tests/report.html" +_TOOL_COVERAGE_RELATIVE = "tests/tool_coverage.json" +_HISTORY_RELATIVE = "tests/.history/runs.jsonl" +_REPRODUCERS_RELATIVE = "tests/reproducers" +_RUN_TESTS_RELATIVE = "run-tests.sh" +_RUN_COUNTER_RELATIVE = "tests/.tui-runs" + +# Graceful-shutdown budgets (seconds) for the pytest subprocess when the +# user hits `q`. Matches what the existing CLI's atexit + userprefs sidecar +# self-heal expects. +_SIGINT_GRACE_S = 5.0 +_SIGTERM_GRACE_S = 5.0 + + +# --------------------------------------------------------------------------- +# Path resolution +# --------------------------------------------------------------------------- + + +def _mcp_server_root() -> pathlib.Path: + """Locate the mcp-server directory (the one containing run-tests.sh).""" + here = pathlib.Path(__file__).resolve() + # Walk up until we find pyproject.toml with a matching project name, or + # default to the three-up ancestor (src/meshtastic_mcp/cli/test_tui.py → + # .../mcp-server). The walk-up protects against unusual checkouts. + for parent in (here.parent, *here.parents): + if (parent / "pyproject.toml").is_file() and ( + parent / "run-tests.sh" + ).is_file(): + return parent + return here.parents[3] + + +# --------------------------------------------------------------------------- +# Data classes +# --------------------------------------------------------------------------- + + +@dataclass +class LeafReport: + """Per-test state drawn from reportlog events. + + Outcomes mirror pytest's: "passed" | "failed" | "skipped" | "running". + """ + + nodeid: str + tier: str + outcome: str = "pending" + duration_s: float = 0.0 + longrepr: str = "" + # Captured stdout / stderr / firmware-log sections from the test's + # `TestReport.sections` — shown in the failure-detail modal. + sections: list[tuple[str, str]] = field(default_factory=list) + # Wall-clock start/stop from the TestReport event. Used by the + # reproducer exporter (`x`) to filter `tests/fwlog.jsonl` down to + # just the lines around the failure window. + start_ts: float | None = None + stop_ts: float | None = None + + +@dataclass +class TierCounters: + tier: str + passed: int = 0 + failed: int = 0 + skipped: int = 0 + running: int = 0 + remaining: int = 0 + + +@dataclass +class DeviceRow: + role: str | None + port: str + vid: str + pid: str + description: str + # Populated from info.device_info when available; empty dict when we + # haven't queried (or when the poller is paused). + info: dict[str, Any] = field(default_factory=dict) + + +@dataclass +class State: + """Shared state owned by the App; written by workers under `lock`. + + UI code reads via Textual Message handlers which run on the UI thread + in the order workers called `post_message` — so reads don't need the + lock themselves. + """ + + lock: threading.Lock = field(default_factory=threading.Lock) + tiers: dict[str, TierCounters] = field( + default_factory=lambda: {t: TierCounters(tier=t) for t in TIERS} + ) + leaves: dict[str, LeafReport] = field(default_factory=dict) + # Ordered list of nodeids in the order they were first seen — lets us + # rebuild the tree deterministically. + nodeid_order: list[str] = field(default_factory=list) + devices: list[DeviceRow] = field(default_factory=list) + run_active: bool = False + exit_code: int | None = None + # nodeid of the currently-running test. Set on `when="setup"` + + # outcome="passed" (body about to execute); cleared on `when="call"` + # (any outcome) or on `when="setup"` + outcome="failed" (no body + # window). Drives the device-table "Status" column so the operator + # can see which test is touching a given device right now. + running_nodeid: str | None = None + # `time.monotonic()` captured when `running_nodeid` was set. Surfaced + # as live-updating elapsed-time ("RUNNING: test_bake_nrf52 (1:23)") so + # an operator staring at a ~3 min `test_00_bake` or a `mesh_formation` + # with a 60 s ceiling has concrete evidence the test isn't stuck. + running_started_at: float | None = None + + +# --------------------------------------------------------------------------- +# Helpers +# --------------------------------------------------------------------------- + + +def _tier_of_nodeid(nodeid: str) -> str: + """Map a pytest nodeid to its tier bucket. Unknown → 'other'. + + `tests/test_00_bake.py::...` is special-cased to the synthetic `bake` + tier — it's a top-level file (no tier subdirectory) so the generic + "second path segment" logic would miss it and route the bake outcomes + into the non-existent `other` bucket. + """ + parts = nodeid.split("/", 2) + if len(parts) >= 2 and parts[0] == "tests": + # Bake file sits at `tests/test_00_bake.py` — dedicated bucket. + if parts[1].startswith("test_00_bake"): + return "bake" + candidate = parts[1] + if candidate in TIERS: + return candidate + return "other" + + +def _file_of_nodeid(nodeid: str) -> str: + """Extract the test file name (e.g. 'test_boards.py') from a nodeid.""" + left = nodeid.split("::", 1)[0] + return left.rsplit("/", 1)[-1] + + +def _testname_of_nodeid(nodeid: str) -> str: + """Extract the 'test_foo[param]' suffix from a nodeid, or the full thing.""" + if "::" in nodeid: + return nodeid.split("::", 1)[1] + return nodeid + + +def _roles_from_nodeid(nodeid: str) -> set[str]: + """Infer which device roles a parametrized test touches. + + Patterns we recognize (from the existing ``conftest.py`` parametrization + in ``pytest_generate_tests``): + + - ``test_foo[nrf52]`` → {"nrf52"} (baked_single) + - ``test_foo[nrf52->esp32s3]`` → {"nrf52", "esp32s3"} (mesh_pair) + + Unparametrized tests (no bracket) return an empty set — the caller + should fall back to "this test involves ALL detected devices" rather + than pretending it touches none. + """ + if "[" not in nodeid or not nodeid.endswith("]"): + return set() + try: + inner = nodeid.rsplit("[", 1)[1][:-1] + except Exception: + return set() + # Split on "->" for directed mesh pairs; otherwise treat as single role. + parts = [p.strip() for p in inner.split("->")] if "->" in inner else [inner.strip()] + return {p for p in parts if p} + + +def _parse_events(path: pathlib.Path) -> Iterator[dict[str, Any]]: + """Yield parsed JSON dicts from a reportlog file, skipping malformed lines. + + Used for smoke-testing the parser against a finished file; the live + worker has its own tail loop. + """ + if not path.is_file(): + return + with path.open("r", encoding="utf-8") as fh: + for line in fh: + line = line.strip() + if not line: + continue + try: + yield json.loads(line) + except json.JSONDecodeError: + continue + + +def _load_run_number(counter_path: pathlib.Path) -> int: + """Bump + persist a monotonic run counter used in the TUI header.""" + try: + n = int(counter_path.read_text().strip()) + except Exception: + n = 0 + n += 1 + try: + counter_path.parent.mkdir(parents=True, exist_ok=True) + counter_path.write_text(str(n)) + except Exception: + # Non-fatal: the counter is cosmetic. + pass + return n + + +def _resolve_seed() -> str: + """Mirror the default-seed resolution from run-tests.sh. + + Operator can override via MESHTASTIC_MCP_SEED. Matches the + per-user/per-host default so repeated invocations land on the same PSK + (makes --assume-baked valid across invocations). + """ + if explicit := os.environ.get("MESHTASTIC_MCP_SEED"): + return explicit + try: + who = os.environ.get("USER") or os.environ.get("LOGNAME") or "anon" + except Exception: + who = "anon" + try: + import socket + + host = socket.gethostname().split(".", 1)[0] + except Exception: + host = "host" + return f"mcp-{who}-{host}" + + +def _format_duration(seconds: float) -> str: + if seconds < 60: + return f"{seconds:5.1f}s" + m, s = divmod(int(seconds), 60) + return f"{m:d}:{s:02d}" + + +# --------------------------------------------------------------------------- +# Textual imports (lazy — only when main() runs, so `_parse_events` can be +# imported by smoke tests without requiring textual installed in every env) +# --------------------------------------------------------------------------- + + +def _import_textual() -> Any: + """Return a namespace carrying every Textual class we use. + + Deferred import keeps `_parse_events` + `_tier_of_nodeid` importable + from tests / smoke scripts without pulling in the UI stack. + """ + import textual + from textual.app import App, ComposeResult + from textual.binding import Binding + from textual.containers import Horizontal, Vertical + from textual.message import Message + from textual.screen import ModalScreen + from textual.widgets import DataTable, Footer, Input, RichLog, Static, Tree + + ns = argparse.Namespace() + ns.App = App + ns.Binding = Binding + ns.ComposeResult = ComposeResult + ns.DataTable = DataTable + ns.Footer = Footer + ns.Horizontal = Horizontal + ns.Input = Input + ns.Message = Message + ns.ModalScreen = ModalScreen + ns.RichLog = RichLog + ns.Static = Static + ns.Tree = Tree + ns.Vertical = Vertical + ns.textual = textual + return ns + + +# --------------------------------------------------------------------------- +# main() — the important scaffolding lives here so that when we bail out +# before entering the Textual event loop (missing terminal, --help, etc.) +# nothing has grabbed the screen yet. +# --------------------------------------------------------------------------- + + +def main(argv: list[str] | None = None) -> int: + """Entry point for `meshtastic-mcp-test-tui`.""" + argv = list(argv if argv is not None else sys.argv[1:]) + + parser = argparse.ArgumentParser( + prog="meshtastic-mcp-test-tui", + description=( + "Live Textual TUI wrapping mcp-server/run-tests.sh. " + "Passes any unrecognized arguments through to pytest." + ), + allow_abbrev=False, + ) + parser.add_argument( + "--no-tui", + action="store_true", + help=( + "Skip the TUI and exec run-tests.sh directly. Useful as a health " + "check that the wrapper argv+env resolution is working." + ), + ) + args, pytest_args = parser.parse_known_args(argv) + + root = _mcp_server_root() + run_tests = root / _RUN_TESTS_RELATIVE + reportlog = root / _REPORTLOG_RELATIVE + fwlog = root / _FWLOG_RELATIVE + flashlog = root / _FLASHLOG_RELATIVE + counter = root / _RUN_COUNTER_RELATIVE + + if not run_tests.is_file(): + print( + f"error: could not locate {_RUN_TESTS_RELATIVE} relative to " + f"{root}. Is this the mcp-server checkout?", + file=sys.stderr, + ) + return 2 + + # Always clear stale log files before launching pytest. The TUI's tail + # workers race pytest file-creation; starting from a known-empty state + # avoids mid-line-decode confusion from the prior run. The fwlog session + # fixture also truncates on its end, and run-tests.sh truncates the + # flashlog — triple-truncate is deliberate (whichever side creates the + # file first, it starts empty). + for p in (reportlog, fwlog, flashlog): + try: + p.unlink(missing_ok=True) + except Exception: + pass + + # Compute + persist the run counter for the header (cosmetic). + run_number = _load_run_number(counter) + seed = _resolve_seed() + # Export the seed so the subprocess inherits the SAME value the TUI + # displays. run-tests.sh computes its own fallback if unset, and we'd + # end up with a header / wrapper-header mismatch if we let that happen. + os.environ.setdefault("MESHTASTIC_MCP_SEED", seed) + # Turn on subprocess-output tee'ing so `pio._run_capturing` writes each + # line of pio / esptool / nrfutil / picotool output to `tests/flash.log` + # as it arrives. The TUI tails that file and routes each line to the + # pytest pane so the operator sees live flash progress during long + # `pio run -t upload` / `esptool erase_flash` operations. run-tests.sh + # also sets this when invoked directly — `setdefault` so the wrapper's + # value wins when present. + os.environ.setdefault("MESHTASTIC_MCP_FLASH_LOG", str(flashlog)) + + # --no-tui: exec run-tests.sh directly. Useful for diagnosing wrapper + # env / argv handling without getting into Textual's alternate screen. + if args.no_tui: + cmd = [str(run_tests), *pytest_args] + os.execv(str(run_tests), cmd) # noqa: S606 — intentional + + # Textual UI import is deferred so `--help` and `--no-tui` do not pay + # the ~40 MB startup cost. + try: + tx = _import_textual() + except ImportError as exc: + print( + f"error: textual is not installed ({exc}). Install with: " + f"pip install -e '.[test]'", + file=sys.stderr, + ) + return 2 + + # Narrow-terminal warning (see plan §8 risk 2). Textual itself degrades, + # but a heads-up helps a first-time user. + term = os.environ.get("TERM", "") + if term in ("", "dumb", "screen") and not os.environ.get("TEXTUAL_NO_TERM_HINT"): + print( + f"[hint] TERM={term!r} may render poorly. Try " + f"`TERM=xterm-256color meshtastic-mcp-test-tui ...` if the layout " + f"looks broken.", + file=sys.stderr, + ) + + app = _build_app( + tx=tx, + root=root, + run_tests=run_tests, + reportlog=reportlog, + fwlog=fwlog, + flashlog=flashlog, + seed=seed, + run_number=run_number, + pytest_args=pytest_args, + ) + + # App.run() returns the subprocess exit code via `app.exit(returncode)`. + return_value = app.run() + if isinstance(return_value, int): + return return_value + return 0 + + +# --------------------------------------------------------------------------- +# Everything below is only reachable once Textual is importable. `tx` is +# the namespace returned by `_import_textual()` so we don't scatter `from +# textual import ...` across the file. +# --------------------------------------------------------------------------- + + +def _build_app( + *, + tx: Any, + root: pathlib.Path, + run_tests: pathlib.Path, + reportlog: pathlib.Path, + fwlog: pathlib.Path, + flashlog: pathlib.Path, + seed: str, + run_number: int, + pytest_args: list[str], +) -> Any: + """Assemble TestTuiApp with its Textual-dependent inner classes. + + Keeping the class definitions inside a factory means `main()` can + short-circuit (--no-tui, terminal-check, argparse error) before we + force Textual's import cost. + """ + + # Helper modules — lazy-imported here so the top-of-file import cost + # only kicks in when main() has decided to run the TUI. + from . import _flashlog as _flashlog_mod + from . import _fwlog as _fwlog_mod + from . import _history as _history_mod + from . import _reproducer as _reproducer_mod + + # ---------------- Messages ---------------- + + class ReportLogEvent(tx.Message): + def __init__(self, event: dict[str, Any]) -> None: + self.event = event + super().__init__() + + class PytestLine(tx.Message): + def __init__(self, source: str, line: str) -> None: + self.source = source # "stdout" | "stderr" + self.line = line + super().__init__() + + class FirmwareLogLine(tx.Message): + def __init__(self, record: dict[str, Any]) -> None: + # {"ts": float, "port": str | None, "line": str} + self.record = record + super().__init__() + + class FlashLogLine(tx.Message): + """Plain-text line from `tests/flash.log` — pio / esptool / nrfutil / + picotool output tee'd by `pio._run_capturing`. Routed to the pytest + pane so the operator sees live flash progress during `test_00_bake` + instead of 3 minutes of pytest-captured silence.""" + + def __init__(self, line: str) -> None: + self.line = line + super().__init__() + + class DeviceSnapshot(tx.Message): + def __init__(self, rows: list[DeviceRow]) -> None: + self.rows = rows + super().__init__() + + class RunFinished(tx.Message): + def __init__(self, returncode: int) -> None: + self.returncode = returncode + super().__init__() + + # ---------------- Workers ---------------- + + class ReportlogWorker(threading.Thread): + """Tail `reportlog.jsonl`, publish each event.""" + + def __init__(self, app: Any, path: pathlib.Path, stop: threading.Event) -> None: + super().__init__(daemon=True, name="reportlog-tail") + self._app = app + self._path = path + self._stop = stop + + def run(self) -> None: + # Wait up to 30 s for pytest to create the file (first call on + # a cold cache can be slow). + wait_deadline = time.monotonic() + 30.0 + while not self._path.is_file(): + if self._stop.is_set() or time.monotonic() > wait_deadline: + return + time.sleep(0.1) + try: + fh = self._path.open("r", encoding="utf-8") + except OSError: + return + try: + while not self._stop.is_set(): + line = fh.readline() + if not line: + time.sleep(0.05) + continue + line = line.strip() + if not line: + continue + try: + event = json.loads(line) + except json.JSONDecodeError: + continue + self._app.post_message(ReportLogEvent(event)) + finally: + fh.close() + + class SubprocessReaderWorker(threading.Thread): + """Read one stream line-by-line and publish PytestLine messages.""" + + def __init__( + self, + app: Any, + stream: Any, + source: str, + stop: threading.Event, + ) -> None: + super().__init__(daemon=True, name=f"subprocess-{source}") + self._app = app + self._stream = stream + self._source = source + self._stop = stop + + def run(self) -> None: + try: + for line in iter(self._stream.readline, ""): + if self._stop.is_set(): + break + self._app.post_message( + PytestLine(source=self._source, line=line.rstrip("\n")) + ) + except Exception: + # stream closed / subprocess died; not fatal. + pass + + class DevicePollerWorker(threading.Thread): + """Poll list_devices() + device_info() at startup and after RunFinished. + + Deliberately NOT polling during the run — `hub_devices` is a + session-scoped fixture holding SerialInterfaces across the whole + session, and device_info() would deadlock on the exclusive port + lock. Header shows "(stale)" during the gap. + """ + + def __init__(self, app: Any, state: State, stop: threading.Event) -> None: + super().__init__(daemon=True, name="device-poller") + self._app = app + self._state = state + self._stop = stop + self._trigger = threading.Event() + + def trigger(self) -> None: + self._trigger.set() + + def run(self) -> None: + # Perform one poll at startup; then wait for explicit triggers. + self._poll_once() + while not self._stop.is_set(): + if self._trigger.wait(timeout=0.5): + self._trigger.clear() + if self._stop.is_set(): + break + with self._state.lock: + active = self._state.run_active + if active: + continue + self._poll_once() + + def _poll_once(self) -> None: + try: + from meshtastic_mcp import devices as devices_mod + from meshtastic_mcp import info as info_mod + except Exception as exc: # pragma: no cover + self._app.post_message( + PytestLine( + source="stderr", line=f"[tui] device import failed: {exc!r}" + ) + ) + return + rows: list[DeviceRow] = [] + try: + raw = devices_mod.list_devices(include_unknown=True) + except Exception as exc: + self._app.post_message( + PytestLine( + source="stderr", line=f"[tui] list_devices failed: {exc!r}" + ) + ) + return + for d in raw: + vid_raw = d.get("vid") or "" + try: + vid_i = ( + int(vid_raw, 16) + if isinstance(vid_raw, str) and vid_raw.startswith("0x") + else int(vid_raw) + ) + except (TypeError, ValueError): + vid_i = 0 + role = None + if vid_i == 0x239A: + role = "nrf52" + elif vid_i in (0x303A, 0x10C4): + role = "esp32s3" + if not role and not d.get("likely_meshtastic"): + continue + row = DeviceRow( + role=role, + port=d.get("port", ""), + vid=str(vid_raw), + pid=str(d.get("pid") or ""), + description=d.get("description", "") or "", + ) + if role: + try: + row.info = info_mod.device_info(port=row.port, timeout_s=6.0) + except Exception as exc: + row.info = {"error": repr(exc)} + rows.append(row) + self._app.post_message(DeviceSnapshot(rows=rows)) + + # ---------------- Modals ---------------- + + class FailureDetailScreen(tx.ModalScreen): + """Show a failed test's longrepr + captured sections.""" + + BINDINGS = [tx.Binding("escape,q", "dismiss", "close")] + + def __init__(self, leaf: LeafReport, report_html: pathlib.Path) -> None: + self._leaf = leaf + self._report_html = report_html + super().__init__() + + def compose(self) -> Any: + yield tx.Static( + f"[bold]{self._leaf.nodeid}[/bold] " + f"outcome=[red]{self._leaf.outcome}[/red] " + f"duration={_format_duration(self._leaf.duration_s)}", + id="failure-detail-header", + ) + log = tx.RichLog( + highlight=False, markup=False, wrap=False, id="failure-detail-log" + ) + yield log + yield tx.Static( + f"[dim]Full HTML report: {self._report_html}[/dim] [esc] close", + id="failure-detail-footer", + ) + + def on_mount(self) -> None: + log = self.query_one("#failure-detail-log", tx.RichLog) + if self._leaf.longrepr: + log.write(self._leaf.longrepr) + log.write("") + for section_name, section_text in self._leaf.sections: + log.write(f"--- {section_name} ---") + log.write(section_text) + log.write("") + if not self._leaf.longrepr and not self._leaf.sections: + log.write("(no longrepr or captured sections in reportlog event)") + + def action_dismiss(self, _result: Any = None) -> None: + self.dismiss() + + class FilterInputScreen(tx.ModalScreen[str]): + """Prompt the user for a tree filter substring (empty clears).""" + + BINDINGS = [tx.Binding("escape", "cancel", "cancel")] + + def compose(self) -> Any: + yield tx.Static("filter test tree (substring, empty = clear):") + yield tx.Input(placeholder="nodeid substring", id="filter-input") + + def on_input_submitted(self, event: Any) -> None: + self.dismiss(event.value.strip()) + + def action_cancel(self) -> None: + self.dismiss(None) + + class CoverageModal(tx.ModalScreen): + """Read `tests/tool_coverage.json` (written by `tests/tool_coverage.py` + at `pytest_sessionfinish`) and render a two-column summary of which + MCP tools got exercised by the run. `(no coverage data yet)` while + the run is in flight.""" + + BINDINGS = [tx.Binding("escape,q,c", "dismiss", "close")] + + def __init__(self, coverage_path: pathlib.Path) -> None: + self._path = coverage_path + super().__init__() + + def compose(self) -> Any: + yield tx.Static("[bold]MCP tool coverage[/bold]", id="coverage-header") + yield tx.RichLog( + highlight=False, markup=True, wrap=False, id="coverage-log" + ) + yield tx.Static( + f"[dim]{self._path}[/dim] [esc] close", + id="coverage-footer", + ) + + def on_mount(self) -> None: + log = self.query_one("#coverage-log", tx.RichLog) + if not self._path.is_file(): + log.write("(no coverage data — tool_coverage.json not written yet)") + log.write("") + log.write("Coverage is emitted at pytest_sessionfinish; this") + log.write("file appears after the suite completes.") + return + try: + data = json.loads(self._path.read_text(encoding="utf-8")) + except Exception as exc: + log.write(f"[red]failed to read {self._path}:[/red] {exc!r}") + return + calls = data.get("calls") or {} + if not calls: + log.write("(tool_coverage.json present but no calls recorded)") + return + exercised = sorted( + ((n, c) for n, c in calls.items() if c > 0), key=lambda x: -x[1] + ) + unexercised = sorted(n for n, c in calls.items() if c == 0) + log.write(f"[b]{len(exercised)} / {len(calls)} MCP tools exercised[/b]") + log.write("") + log.write("[green]exercised[/green] (count):") + for name, count in exercised: + log.write(f" {count:>4} {name}") + if unexercised: + log.write("") + log.write("[dim]not exercised:[/dim]") + for name in unexercised: + log.write(f" {name}") + + def action_dismiss(self, _result: Any = None) -> None: + self.dismiss() + + class ReproducerResultModal(tx.ModalScreen): + """Show the exported reproducer tarball path with a short instruction.""" + + BINDINGS = [tx.Binding("escape,q,enter", "dismiss", "close")] + + def __init__( + self, archive_path: pathlib.Path, error: str | None = None + ) -> None: + self._archive = archive_path + self._error = error + super().__init__() + + def compose(self) -> Any: + if self._error: + yield tx.Static(f"[red]Reproducer export failed:[/red] {self._error}") + else: + yield tx.Static("[bold green]Reproducer bundle written[/bold green]") + yield tx.Static(f"[cyan]{self._archive}[/cyan]") + yield tx.Static("") + yield tx.Static( + "Contains: README.md, test_report.json, fwlog.jsonl (time-filtered)," + ) + yield tx.Static( + "devices.json, env.json. Attach to an issue / paste the path in chat." + ) + yield tx.Static("") + yield tx.Static("[dim][esc] close[/dim]") + + def action_dismiss(self, _result: Any = None) -> None: + self.dismiss() + + # ---------------- App ---------------- + + class TestTuiApp(tx.App): + CSS = """ + Screen { layout: vertical; } + #header-bar { height: 2; padding: 0 1; background: $panel; } + #tier-table { height: auto; max-height: 11; } + #body { height: 1fr; } + #tree-pane { width: 50%; border-right: solid $primary-background; } + #right-pane { width: 50%; layout: vertical; } + #pytest-pane { height: 50%; border-bottom: solid $primary-background; } + #fwlog-header { height: 1; padding: 0 1; background: $panel; } + #fwlog-pane { height: 1fr; } + Tree { height: 100%; } + RichLog { height: 100%; } + #device-table { height: auto; max-height: 6; } + """ + + TITLE = "mcp-server test runner" + + BINDINGS = [ + tx.Binding("r", "rerun_focused", "re-run focused"), + tx.Binding("f", "filter_tree", "filter"), + tx.Binding("d", "failure_detail", "failure detail"), + tx.Binding("g", "open_html_report", "open report.html"), + tx.Binding("x", "export_reproducer", "export reproducer"), + tx.Binding("c", "coverage_panel", "coverage"), + tx.Binding("l", "cycle_fwlog_filter", "fw log filter"), + tx.Binding("q,ctrl+c", "quit_app", "quit"), + ] + + def __init__(self) -> None: + super().__init__() + self._state = State() + self._root = root + self._run_tests = run_tests + self._reportlog = reportlog + self._fwlog = fwlog + self._flashlog = flashlog + self._report_html = root / _REPORT_HTML_RELATIVE + self._tool_coverage = root / _TOOL_COVERAGE_RELATIVE + self._repro_dir = root / _REPRODUCERS_RELATIVE + self._seed = seed + self._run_number = run_number + self._pytest_args = pytest_args + self._start_time = time.monotonic() + self._proc: subprocess.Popen[str] | None = None + self._stop = threading.Event() + self._reportlog_worker: ReportlogWorker | None = None + self._stdout_worker: SubprocessReaderWorker | None = None + self._stderr_worker: SubprocessReaderWorker | None = None + self._device_worker: DevicePollerWorker | None = None + self._fwlog_worker: _fwlog_mod.FirmwareLogTailer | None = None + self._flashlog_worker: _flashlog_mod.FlashLogTailer | None = None + self._tree_filter: str = "" + self._sigint_count = 0 + # Firmware-log port filter: None = all, else exact port match. + self._fwlog_filter: str | None = None + # Ordered set of distinct ports we've seen firmware log lines + # from — the `l` key cycles through these. + self._fwlog_ports: list[str] = [] + # Cross-run history. + self._history_store = _history_mod.HistoryStore( + root / _HISTORY_RELATIVE, keep_last=40 + ) + self._history_cache = self._history_store.read_recent() + + # -------- composition / mount -------- + + def compose(self) -> Any: + yield tx.Static(self._header_text(), id="header-bar") + tier_table = tx.DataTable(id="tier-table", show_cursor=False) + yield tier_table + with tx.Horizontal(id="body"): + with tx.Vertical(id="tree-pane"): + yield tx.Tree("tests", id="test-tree") + with tx.Vertical(id="right-pane"): + with tx.Vertical(id="pytest-pane"): + yield tx.RichLog( + id="pytest-log", + highlight=False, + markup=False, + wrap=False, + max_lines=5000, + ) + yield tx.Static(self._fwlog_header_text(), id="fwlog-header") + with tx.Vertical(id="fwlog-pane"): + yield tx.RichLog( + id="fwlog-log", + highlight=False, + markup=False, + # `wrap=True` so long firmware log lines (some + # hit ~200 chars — full packet hex dumps plus + # source tags) don't get truncated at the + # right edge. The right pane is ~50% of the + # terminal so even a wide terminal has a + # ~90-char cap; plain truncation dropped the + # uptime counter or packet id off the end. + wrap=True, + max_lines=5000, + ) + yield tx.DataTable(id="device-table", show_cursor=False) + yield tx.Footer() + + def _fwlog_header_text(self) -> str: + filt = self._fwlog_filter or "(all ports)" + return f"firmware log filter: [b]{filt}[/b] [l] cycle" + + def on_mount(self) -> None: + # Tier-counters table. `add_column` (singular) lets us pick + # the key explicitly — `add_columns` (plural) in textual 8.x + # returns auto-generated keys that are tedious to track + # separately, and update_cell(column_key=