Skip to content

fix(localdns): handle exporter client disconnects#8791

Open
jingwenw15 wants to merge 1 commit into
mainfrom
jingwenwu/localdns-exporter-broken-pipe
Open

fix(localdns): handle exporter client disconnects#8791
jingwenw15 wants to merge 1 commit into
mainfrom
jingwenwu/localdns-exporter-broken-pipe

Conversation

@jingwenw15

Copy link
Copy Markdown
Member

Why

AgentBaker Linux E2E has a recurring localdns-exporter-systemd-failed-state flake where the LocalDNS exporter functional validation passes, but a socket-activated localdns-exporter@...service worker remains in failed state afterward.

Latest recurrence: ADO build 170230870, repair item 38581800.

The failed unit log showed the worker exited non-zero after the scrape client closed the socket while the script was still writing the metrics response:

localdns_exporter.sh: cat: write error: Broken pipe
localdns-exporter@...service: Main process exited, code=exited, status=1/FAILURE

A client disconnect is expected for per-connection socket-activated workers and should not leave the unit failed.

What

  • Treat SIGPIPE / closed stdout as successful request termination.
  • Replace raw response writes with helpers that exit 0 on broken pipe.
  • Stream generated .prom files through the same safe write path instead of cat.
  • Add ShellSpec coverage for a client that closes during the metrics response.

Validation

shellspec --shell bash spec/parts/linux/cloud-init/artifacts/localdns_exporter_spec.sh
25 examples, 0 failures

Also ran:

bash -n parts/linux/cloud-init/artifacts/localdns_exporter.sh
shellcheck parts/linux/cloud-init/artifacts/localdns_exporter.sh
git diff --check -- parts/linux/cloud-init/artifacts/localdns_exporter.sh spec/parts/linux/cloud-init/artifacts/localdns_exporter_spec.sh

Treat broken-pipe writes as successful socket request termination so socket-activated localdns-exporter worker units do not remain failed when a scrape client closes early.

Add ShellSpec coverage for clients that close during the metrics response.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a recurring Linux E2E flake where socket-activated localdns-exporter@...service workers can remain in a failed state if the scrape client disconnects while the exporter is still writing the metrics response. The change makes stdout write failures (broken pipe / SIGPIPE) terminate successfully, and adds ShellSpec coverage for that disconnect scenario.

Changes:

  • Add a PIPE trap and emit* helpers so response writes exit 0 when the client closes the socket.
  • Stream .prom file output through the safe write path (replacing raw cat).
  • Add a ShellSpec test that simulates a client closing the connection mid-response.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
parts/linux/cloud-init/artifacts/localdns_exporter.sh Adds SIGPIPE handling and safe emit helpers to avoid failed systemd worker units on client disconnect.
spec/parts/linux/cloud-init/artifacts/localdns_exporter_spec.sh Adds a regression test that simulates a client closing during the /metrics response.

Comment on lines +27 to +29
emit() {
printf "%s\n" "$*" 2>/dev/null || exit 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants