feat: add health check with restart-on-failure for self-healing#434
Open
LaurenceJJones wants to merge 3 commits intocrowdsecurity:mainfrom
Open
feat: add health check with restart-on-failure for self-healing#434LaurenceJJones wants to merge 3 commits intocrowdsecurity:mainfrom
LaurenceJJones wants to merge 3 commits intocrowdsecurity:mainfrom
Conversation
Add periodic health checking to detect missing firewall infrastructure (chains, ipsets) and trigger a process restart when detected. Implementation: - CheckHealth() on Backend interface verifies chains/ipsets exist - Health checker goroutine runs periodically (default 30s) - On failure detection, returns ErrUnrecoverable to exit the process - Prometheus metrics track health status and failure counts - HTTP /health endpoint exposes health status as JSON Why restart instead of in-memory self-heal: The StreamBouncer from go-cs-bouncer has an internal 'startup' flag that is set to true only on first run, causing LAPI to send all decisions. After startup, it only sends deltas. This flag is not exposed or resettable. Storing decisions in memory to replay after reinit was considered, but restarting the process is simpler and leverages the existing StreamBouncer behavior - on restart, startup=true triggers a full decision sync from LAPI. Systemd handles restart limiting. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Since the process restarts on health check failure, metrics and the /health endpoint provide no value - they reset/disappear on restart. Keep only the core health check logic that detects missing firewall infrastructure and triggers the restart. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove unused receivers in CheckHealth stubs (nftables, pf) - Bump cyclomatic complexity limit 29->30 for health check addition - Bump function-length limit 153->160 for health check addition - Simplify deprecated daemon option check Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
WIP
Add periodic health checking to detect missing firewall infrastructure (chains, ipsets) and trigger a process restart when detected.
Implementation:
Why restart instead of in-memory self-heal:
The StreamBouncer from go-cs-bouncer has an internal 'startup' flag that is set to true only on first run, causing LAPI to send all decisions. After startup, it only sends deltas. This flag is not exposed or resettable.
Storing decisions in memory to replay after reinit was considered, but restarting the process is simpler and leverages the existing StreamBouncer behavior - on restart, startup=true triggers a full decision sync from LAPI. Systemd handles restart limiting.
We don't offer a container deployment but users have their own, as long as they set the restart options it will also handle this.