fix: keep dispatching messages in a batch when one of them errors or throws#5057
fix: keep dispatching messages in a batch when one of them errors or throws#5057alberti42 wants to merge 1 commit into
Conversation
|
Can you rebase this and simply comment changes you made (with GitHub review)? The git diff isn't very useful with large files. 😓 Thank you! |
4ed5c31 to
971d2d9
Compare
|
@jcs090218 Thanks for looking into this PR!
I see the problem: a lot of the noise comes from indentation changes after wrapping the parsing loop https://github.com/emacs-lsp/lsp-mode/pull/5057/files?w=1 On my screen that hides all the indentation noise and leaves just the real logic. If it still isn't |
A single TCP read often delivers multiple framed messages. Two failure modes could lose later messages in the same batch: 1. A framing-parse error (e.g. when JSON parsing of an earlier body leaves the framing state inconsistent and the next iteration enters header parsing on mid-body bytes) would escape the parsing loop with an uncaught `(error "Unable to find Content-Length header.")', losing any messages already accumulated in the local list. 2. A `(throw 'lsp-done ...)' or `(throw 'input ...)' from inside a message's response handler would terminate `mapc' before subsequent messages were dispatched. These tags are caught higher in the call stack (e.g. in `lsp-request-while-no-input'), and the design assumes the catch handles the throw locally — but `mapc' is between the handler and the catch, and gets killed in transit. This commit: - Wraps the framing loop in `condition-case' that salvages already-parsed messages before resetting framing state for the next read. - Replaces `mapc' with a per-message `dolist' that catches `'lsp-done' and `'input' throws, queues them, dispatches the remaining messages, and re-issues the saved throw after the loop completes. The original target catch still receives the throw — just delayed until after the batch is fully drained.
971d2d9 to
4e523c9
Compare
Summary
lsp--create-filter-functionis the function that turns the raw bytesarriving from the language server's process into JSON messages and hands
them to the rest of
lsp-mode. A single read from the server very oftendelivers more than one message at once, so the filter parses what it can
and then dispatches each message in turn.
There are two ways the existing code can lose messages that were already
parsed and ready to dispatch:
prefixes each message with
Content-Length: N\r\n\r\nheaders andthen
Nbytes of JSON body. If the parsing loop's view of the bytestream becomes mis-aligned and the next iteration tries to read
headers from the middle of a body, the code signals an error
(
Unable to find Content-Length header.). That error escapes thewhole filter and takes the list of already-parsed messages with it.
throwfrom a message handler abandons the rest of the batch.When a message handler eventually triggers a
(throw 'lsp-done …)or(throw 'input …)— these are caught higher up by code likelsp-request-while-no-input— the throw passes back through thefilter on its way up. The filter currently dispatches messages with
mapc, andmapcis killed by the throw mid-loop. Messages later inthe batch never run.
Both are independent of the contents of the messages. The most visible
symptom is a server that gets stuck waiting for a reply to a request
(such as
window/workDoneProgress/createorworkspace/configuration)that the client did receive and parse but never dispatched.
This PR makes both paths preserve the already-parsed messages:
condition-casearound the parsing loop catches a framing error,logs it, resets the framing state, and falls through so the messages
parsed before the error still get dispatched.
mapcwith a per-messagedolistthatcatches
'lsp-doneand'inputthrows locally, finishes dispatchingthe rest of the batch, and then re-issues the saved throw so it still
reaches its original target.
No public APIs change. The normal path is unchanged. The only difference
a user can observe is that batches are no longer truncated when one
message in them fails.
Background
Three pieces of context that the rest of the PR depends on.
How messages reach the filter
lsp-modeconnects to a language server over a pipe (or a TCP socket).Whenever Emacs has bytes available from that connection, it calls the
process filter — the function returned by
lsp--create-filter-function— with the bytes that just arrived.
Two things to keep in mind:
whole messages, or some mix. The filter has to buffer leftover bytes
across calls and split out complete messages as it goes.
loses a message — by throwing it away, by leaving it in the buffer, or
by signalling an error before dispatching it — that message is gone;
there is no retry from the network layer.
LSP wire framing
Each message on the wire looks like this:
The filter walks through the byte buffer alternating between two states:
"reading headers" and "reading a body of known length." When a body is
complete, it parses the JSON, appends the result to a local
messageslist, resets the framing state, and goes back to "reading headers" for
whatever bytes are left.
catch/throw(briefly)Emacs Lisp's
(throw 'TAG VAL)jumps up the call stack to the nearestmatching
(catch 'TAG …), discarding every function call in between. Ifno matching catch is on the stack, Emacs raises a
no-catcherror.lsp-modeuses(throw 'lsp-done …)and(throw 'input …)to breakout of synchronous wait loops in callers like
lsp-request-while-no-input.Crucially, the catches for these tags are higher in the stack than the
filter — so when the throw fires inside a message handler, it travels up
through the filter on its way to the catch.
The two bugs
Bug 1 — a framing error escapes the parsing loop
The parsing loop pulls header lines with the equivalent of:
Under normal operation, every successful body parse leaves
chunkpositioned right at the start of the next
Content-Lengthline, so thesearch always succeeds.
Under abnormal operation, it can fail. The most concrete way: an earlier
JSON body fails to parse, the inner error handler logs the failure but
the framing state ends up in a position where the next iteration looks
for headers in what is actually mid-body bytes. The
string-match-pcall returns nil, the
(error …)fires, and the error escapes both theinner
condition-case(which only covers JSON parse) and the outerwhileloop.The damage: at the moment the error fires, the local list
messagesmayalready hold one or more complete, parsed JSON objects from earlier in
the same read. Because the error escapes the whole filter function, the
dispatch step that would normally run them never runs. Those messages
are silently dropped.
If one of the dropped messages is a server-initiated request that
expects a reply (for example
window/workDoneProgress/create), theserver hangs waiting for a response that the client has already thrown
away.
Bug 2 — a
throwfrom a message handler abandons the rest of the batchAfter parsing, the filter dispatches messages with:
Suppose the batch contains two messages: a response to a client request
and a server-initiated
window/workDoneProgress/create.mapcruns thelambda on the response first. The response's callback is the one
registered by
lsp-request-while-no-input; that callback ends with(throw 'lsp-done '_). The throw walks up the stack toward the catch inlsp-request-while-no-input.The catch is several frames above the filter. To get there, the throw
unwinds — among other frames — the lambda and the call to
mapcitself.mapcdoes not survive the throw: it is mid-iteration, but it nevergets to start the next iteration. The second message in the batch
(
window/workDoneProgress/create) is never dispatched.The catch in
lsp-request-while-no-inputdoes what it was designed todo, the user's wait completes, and from
lsp-request-while-no-input'spoint of view everything is fine. From the server's point of view, it
sent a request that needs a reply and never got one. The server hangs.
The same shape of failure applies to
(throw 'input …), which is usedelsewhere in
lsp-modeto cancel work when user input arrives.This bug compounds with bug 1 and with the id-collision bug fixed in
PR 1: any of the three is enough to silently drop a server-initiated
request, and any dropped server-initiated request that expects a reply
can leave the server stuck.
The fix
Fix 1 — wrap the parsing loop and salvage already-parsed messages
Wrap the framing
whileloop in acondition-case. If any errorescapes the loop, log it, reset framing state for the next read, and
fall through to the dispatch step. The local
messageslist ispreserved, so any messages parsed before the error still get
dispatched.
Two notes:
leftovers,body-length,body-received,body)is the same reset the parsing loop already does after a successful
body parse. Doing it on the error path puts the filter into a known
state for the next read instead of leaving it half-way through a
message it could not finish parsing.
condition-casearound JSON parsing is left in place. JSONparse errors continue to be reported with the existing
lsp-warnmessage; only framing errors that escape that inner handler are now
caught.
Fix 2 — per-message dispatch with throw queueing
Replace
mapcwith adolistthat wraps each individual call tolsp--parser-on-messagein(catch 'lsp-done …)and(catch 'input …).If a handler throws either tag, the local catch receives it, the loop
records the tag and value, and the loop continues with the next message.
After all messages in the batch have been dispatched, the saved throw
is re-issued so the original target catch — for example the one in
lsp-request-while-no-input— still receives it.How it works:
sentinelis created at the start. A normal returnfrom the inner forms is signalled by returning that exact cons; any
other value must have come from a
throw. Using a fresh cons (via(cons nil nil)) meanseqreliably distinguishes normal returnfrom a thrown value, even if a handler throws something that happens
to look like the sentinel.
'input(innermost) and one for'lsp-done(outer). If eitherthrows, the corresponding catch returns the thrown value instead of
the sentinel; the loop records
queued-tagandqueued-valueandmoves on to the next message.
(throw queued-tag queued-value). The catch higher up the stack —the one the original throw was aimed at — still receives the throw,
just slightly delayed: every message in the same batch is dispatched
first.
'lsp-doneand'inputare caught here. These are the two tagslsp-modeuses for synchronous-wait coordination. Any other throw(or any error) is left to propagate as before, so this change does
not silently swallow unrelated control flow.
replaces the earlier one. This matches the behaviour you would get
without the patch when the first throwing message wins; the patch
prefers the last throwing message instead, which is consistent with
finishing the batch before exiting.
Why this is preferable to the alternatives
A few alternatives to consider:
This would require changes to every caller that uses
'lsp-doneor'input. The filter is the single point that sees the whole batch,so handling the rescue here is the smallest, most local change.
would avoid the batch-abandon problem at the cost of changing
lsp-mode's observable timing in many places — synchronous waitswould need extra ticks to complete, and the order of side effects
could shift. Far more invasive than the local fix.
throws, not just'lsp-doneand'input. Catchingunknown throw tags risks hiding control flow we did not anticipate,
including ones that other parts of the code may legitimately use to
exit. The two tags caught here are the ones we know are used by
synchronous-wait callers and that are the source of the abandon-batch
symptom.
require redesigning the way
lsp-modedoes synchronous waiting. Alarge change for the same effect; the local fix is enough.
Compatibility
lsp--create-filter-functionkeeps the samesignature and the same observable behaviour for batches that contain
no errors and no throws.
(catch 'lsp-done …)or(catch 'input …)still receives the throwwith the original value, just after the rest of the batch has been
dispatched.
condition-caseonly catches errors that escape the parsing loop;the inner JSON-parse
condition-caseis unchanged.lsp-warnis the standard waylsp-modereports recoverableproblems. The new framing-error log uses it rather than introducing
a new diagnostic channel.
Reproducing the bugs before the patch
Bug 2 is the easier of the two to reproduce reliably:
(
window/workDoneProgress/create,workspace/configuration, …) andresponses to client requests in close succession.
ltex-ls-plusdoesthis naturally during interactive editing.
lsp-completion-enableand use Corfu withcorfu-auto tanda short
corfu-auto-delay.silent (no diagnostics, no progress notifications), look at the wire
log: you will find a server-initiated request (often
window/workDoneProgress/create) for which the client never sent aresponse.
Bug 1 is harder to reproduce on demand because it depends on a
specific framing-state misalignment. I have observed it as a one-off
event during long sessions; it surfaces in
*Messages*as(error "Unable to find Content-Length header.")followed by theserver going silent. With the patch, the same misalignment is logged
as a
framing-error-salvagewarning and the messages parsed before theerror are still dispatched.
With both fixes applied, server hangs of the kind described above stop
appearing in my testing.
Origin
This issue surfaced while debugging
lsp-ltex-plus(anlsp-modeclient for the
ltex-ls-plusgrammar/spell server).ltex-ls-plusissues server-initiated requests interleaved with responses to client
requests during interactive editing, which made the abandon-batch case
happen reliably. To unblock users while a proper fix is discussed
upstream,
lsp-ltex-plushas been shipping the patch as a localoverride of
lsp--create-filter-functionsince releasevX.Y.Z(https://github.com/alberti42/emacs-ltex-plus/releases). This is
mentioned only as context — the bug is in
lsp-moderegardless ofwhich client is connected, and the fix belongs upstream so other
clients benefit without each one having to ship its own override.
Related work
This is the third of three independent fixes to how
lsp-modehandlesmessages when there are many requests waiting for responses at the same
time. The other two are:
lsp--parser-on-messageclassify messages by theirmethodfield before theirid. Today, when a server-initiatedrequest happens to use an id that matches a pending client request,
it is mis-routed as a response. PR 1 fixes that.
lsp-request-while-no-inputignore callbacks thatarrive after the function has stopped waiting. Today, a late response
callback can fire after the surrounding
(catch 'lsp-done …)hasalready gone, causing a stray
no-catcherror in unrelated code.PR 2 fixes that.
Each PR addresses a separate problem and is independently useful; they
are split apart to keep review focused. PRs 1 and 2 do not depend on
this one — applying any subset of the three is safe.