fix(localdns): handle exporter client disconnects#8791
Open
jingwenw15 wants to merge 1 commit into
Open
Conversation
Treat broken-pipe writes as successful socket request termination so socket-activated localdns-exporter worker units do not remain failed when a scrape client closes early. Add ShellSpec coverage for clients that close during the metrics response. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR addresses a recurring Linux E2E flake where socket-activated localdns-exporter@...service workers can remain in a failed state if the scrape client disconnects while the exporter is still writing the metrics response. The change makes stdout write failures (broken pipe / SIGPIPE) terminate successfully, and adds ShellSpec coverage for that disconnect scenario.
Changes:
- Add a
PIPEtrap andemit*helpers so response writes exit0when the client closes the socket. - Stream
.promfile output through the safe write path (replacing rawcat). - Add a ShellSpec test that simulates a client closing the connection mid-response.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| parts/linux/cloud-init/artifacts/localdns_exporter.sh | Adds SIGPIPE handling and safe emit helpers to avoid failed systemd worker units on client disconnect. |
| spec/parts/linux/cloud-init/artifacts/localdns_exporter_spec.sh | Adds a regression test that simulates a client closing during the /metrics response. |
Comment on lines
+27
to
+29
| emit() { | ||
| printf "%s\n" "$*" 2>/dev/null || exit 0 | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
AgentBaker Linux E2E has a recurring
localdns-exporter-systemd-failed-stateflake where the LocalDNS exporter functional validation passes, but a socket-activatedlocaldns-exporter@...serviceworker remains in failed state afterward.Latest recurrence: ADO build 170230870, repair item 38581800.
The failed unit log showed the worker exited non-zero after the scrape client closed the socket while the script was still writing the metrics response:
A client disconnect is expected for per-connection socket-activated workers and should not leave the unit failed.
What
SIGPIPE/ closed stdout as successful request termination.0on broken pipe..promfiles through the same safe write path instead ofcat.Validation
Also ran: