AG-81 - Enhance Command Registry with Extensible Dispatch System#91
AG-81 - Enhance Command Registry with Extensible Dispatch System#91JeffMboya wants to merge 4 commits into
Conversation
| if len(args) < 2 || args[0] == "" || args[1] == "" { | ||
| return errors.Wrap(errRouteFailed, errInvalidCommand) | ||
| } | ||
| deviceID, hexPayload := args[0], args[1] |
There was a problem hiding this comment.
Issue 1: Error Handling in Route
The Route command publishes errors to the "control" topic but doesn't validate device existence before attempting operations. This could lead to confusing error messages when a device doesn't exist vs when there's an actual interface error.
Recommendation: Consider adding device existence validation before opening interfaces to provide clearer error messages for operators to diagnose issues.
There was a problem hiding this comment.
Done in f5bf596 — Route now resolves the device with devices.Get(deviceID) up front, so a missing device returns a clear not-found error before any interface open is attempted.
| if a.store == nil { | ||
| return notConfigured | ||
| } | ||
| for key, val := range a.store.All() { |
There was a problem hiding this comment.
Issue 2: Configuration Reload Validation
The controlReload function applies persisted values without validating they match expected types/ranges. If invalid values were persisted (e.g., through manual file editing), this could cause runtime issues.
Recommendation: Add validation before applying values and return which keys were updated for auditability.
There was a problem hiding this comment.
Done in f5bf596 — each persisted value is now validated via the existing validateSettableValue before being applied; invalid entries are skipped and logged, and the response lists the applied keys for auditability (e.g. reloaded:log_level).
| // per-device scheduler using the agent's run context. | ||
| func (a *agent) controlStart() string { | ||
| a.paused.Store(false) | ||
| if a.sched != nil && a.runCtx != nil { |
There was a problem hiding this comment.
Issue 3: Nil runCtx Risk in controlStart
The runCtx field is set to the ctx parameter in New(), but if this context is canceled before controlStart is called, restarting the scheduler will fail. This could leave the agent in an inconsistent state where stop works but start fails.
Recommendation: Add a nil check and consider using a separate lifecycle context that isn't tied to the initialization context.
There was a problem hiding this comment.
Done in f5bf596 — controlStart now skips the scheduler restart when runCtx is nil or already cancelled and logs the reason instead of failing silently. A fully independent lifecycle context would leak device goroutines past process shutdown (the scheduler ctx must descend from the process ctx), so the explicit cancelled-context guard is the safer fix here.
Enhance the command dispatch system per issue absmach#81: - Add command metadata (name, description, usage) to the registry and a discoverable `help` command that reports it. - Enforce authorization per command via a RequiresAuth flag instead of a single global token gate. - Add `route` command to forward a hex payload to a downstream device interface and return its response. - Extend `control` with stop/start (pause/resume heartbeat, telemetry, and the device scheduler), reload (re-apply persisted config overrides), and status, keeping the existing nodered-* passthrough. - Add `ota,status` to query OTA state over MQTT. - Wire Route through the logging/metrics middleware and regenerate mocks. - Cover the registry, per-command auth, help, route, and control subcommands with tests; document the commands in the README. Signed-off-by: JeffMboya <jangina.mboya@gmail.com>
- Route: resolve the device up front so a missing device returns a clear error instead of an opaque interface-open failure. - controlReload: validate each persisted value before applying, skip and log invalid entries, and report the applied keys for auditability. - controlStart: skip restarting the scheduler when the run context is nil or already cancelled, logging the reason instead of failing silently. - Extend tests for the new reload response and validation behaviour. Signed-off-by: JeffMboya <jangina.mboya@gmail.com>
Document the commands added in this branch now that command docs live under docs/ rather than the README: - control.md: extensible registry + per-command auth note, route and help in the subsystem table, and sections for control lifecycle (stop/start/reload/ status), route, and help. - ota.md: query OTA status via the commands channel (ota,status). Signed-off-by: JeffMboya <jangina.mboya@gmail.com>
- devices.md: note the route command as a one-shot write-then-read to a device interface, linking to its control.md section. - docs/README.md: update the control.md index entry to mention agent lifecycle, route, and help. Signed-off-by: JeffMboya <jangina.mboya@gmail.com>
What does this do?
Adds an extensible command registry with metadata, a help command, per-command auth, plus route, control lifecycle, and ota status commands.
Which issue(s) does this PR fix/relate to?
Resolves #81
List any changes that modify/break current functionality
None breaking. Authorization moves from a global gate to per-command; all built-ins still require a token.
Have you included tests for your changes?
Yes — registry, per-command auth, help, ota status, route validation, and all control subcommands are covered.
Did you document any new/modified functionality?
Yes — README command table and usage examples updated for help, control, ota status, and route.
Notes
New commands are MQTT-only (SenML), matching the issue format; no HTTP endpoints added.