Skip to content

fix: classify JSON-RPC messages by method before id#5055

Merged
jcs090218 merged 1 commit into
emacs-lsp:masterfrom
alberti42:fix/lsp--parser-on-message
May 11, 2026
Merged

fix: classify JSON-RPC messages by method before id#5055
jcs090218 merged 1 commit into
emacs-lsp:masterfrom
alberti42:fix/lsp--parser-on-message

Conversation

@alberti42
Copy link
Copy Markdown
Contributor

@alberti42 alberti42 commented May 3, 2026

Summary

lsp--parser-on-message currently decides what kind of JSON-RPC message it is
looking at by examining the id field first. This is unsafe: a server-initiated
request and a client-initiated response can carry the same id value, in
which case the server's request is mis-routed as a response, fed to the wrong
handler, and never replied to. The server then waits forever for an answer it
will never receive.

This PR reorders the classification so that the presence of a method field is
checked before the id. That single change is enough to eliminate the
collision: by the JSON-RPC spec, only requests and notifications carry a
method, and only responses lack one — so method is an unambiguous
discriminator, while id is not.

No public APIs change. No dispatch ordering changes. The pcase arm order stays
identical. The only observable difference is that messages which were
previously mis-classified are now classified correctly.

Background: the three shapes of a JSON-RPC message

JSON-RPC 2.0 (spec) defines three
message shapes. They are distinguished by which fields are present:

Shape method id result / error
Request required required must NOT be present
Notification required must NOT be present must NOT be present
Response must NOT be present required required

The key observation: method is present iff the message originates a new
exchange (request or notification). id is present in both requests and
responses — so it cannot, on its own, tell the two apart.

A correct dispatcher must therefore look at method first.

The bug

lsp--parser-on-message currently routes via lsp--get-message-type, which
checks id first. The decision tree is roughly:

if id is present:
    if there is an `error` field   -> response-error
    else                            -> response
else if method is present           -> notification or request

This works as long as the id of a server-initiated request never coincides
with the id of a pending client request. But nothing in the JSON-RPC spec
forbids that coincidence, and in practice it happens all the time.

A concrete collision

ltex-ls-plus (and other servers built on lsp4j) numbers its
server-initiated requests with string ids: "1", "2", "3", …
Meanwhile, lsp-mode numbers its client-initiated requests with integer
ids: 1, 2, 3, …

That sounds safe — different types, no clash — but on the wire both end up
serialized as the same digits, and lsp-mode normalizes string ids to integers
before lookup. So when the client has a pending request with id 4 and the
server happens to send a request with id "4", the dispatcher sees a message
with id = 4, finds an entry in its response-handler table, and dispatches
it as a response. The wrong handler runs; the server's actual request is
silently dropped.

What the user sees

The most common symptom is a server stuck "checking" forever. The server is
politely waiting for a reply (to window/workDoneProgress/create,
workspace/configuration, or similar) that the client has already consumed
under the wrong handler. No error appears anywhere — the message simply
disappears.

A reproducible setup:

  • ltex-ls-plus as the server and lsp-ltex-plus as the client.
  • lsp-completion-enable t and a fast-typing client like Corfu with a short
    corfu-auto-delay.
  • After a few keystrokes, an id collision occurs and the server's next
    window/workDoneProgress/create is swallowed. The progress UI never closes;
    the next textDocument/publishDiagnostics never fires.

A second, smaller bug amplifies the first one. lsp--get-message-type's
id-extraction path goes through lsp:json-response-id, which is shaped for
response messages (where id lives at the top level, alongside result /
error). When it is handed a request shape, it can signal an error. That
error is swallowed by the surrounding with-demoted-errors, so the message is
not just mis-routed — it is dropped entirely, with no log entry. From the
outside it looks like the server simply never sent the request.

The fix

Replace the id-first classification with a method-first one. The new
decision tree is:

if method is present:
    if id is also present  -> request
    else                    -> notification
else if id is present:
    if `error` is present   -> response-error
    else                    -> response
else                        -> notification (defensive)

In code:

(message-type (cond
               (has-method (if has-id 'request 'notification))
               (has-id (if has-error 'response-error 'response))
               (t 'notification)))

That is the substantive change. Everything else in the diff is plumbing to
support it cleanly:

  • A small json-get helper reads top-level fields without going through the
    response-shaped accessor. It works on both hash-table and plist
    representations, so the parser remains correct under both lsp-use-plists
    settings.

  • The id is normalised (string → integer) only on the response branches,
    where lsp-mode's handler table expects integer keys. Request and
    notification handlers are passed the original json-data and don't need a
    normalised id.

  • The response and response-error branches now guard the handler lookup with
    (when handler ...) instead of cl-assert id. The assert was already
    unreachable in practice (the old classifier guaranteed id was present),
    but removing it lets the new code degrade gracefully if a stale id ever
    arrives — which is preferable to crashing inside with-demoted-errors and
    losing the message silently. The pcase arm order is unchanged.

The surrounding with-demoted-errors wrap is preserved: a malformed message
should still not crash the parser. The new classification simply removes the
most common cause of demoted errors, so in steady state the wrap should
rarely trigger.

Compatibility

  • No API change. lsp--parser-on-message keeps the same signature and
    return contract.
  • No dispatch order change. Each call still handles exactly one message;
    batch ordering is the caller's responsibility and is untouched.
  • No new dependencies. cl-labels and cond are already used throughout
    lsp-mode.
  • Plist and hash-table representations both work via the json-get
    helper, so lsp-use-plists users are unaffected.
  • Servers that never collide (e.g. servers that only use integer ids
    outside the client's range) see no behavioural change — the new
    classification produces the same answer as the old one in the
    non-colliding case.

Reproducing the bug before the patch

For maintainers who want to confirm: run lsp-mode against ltex-ls-plus with
LSP-side completion enabled and watch the JSON-RPC log. Look for a message of
the form

{"jsonrpc":"2.0","id":"<N>","method":"window/workDoneProgress/create", ...}

arriving from the server, where <N> is an integer that also appears as the
id of a recent client request. With the current code, no
{"jsonrpc":"2.0","id":"<N>","result":null} reply is ever sent back. With
this patch, the reply is sent.

A small auditing script is included in my working notes
(docs/parse-ltex-stdin-emacs.py) that scans an LSP wire log and flags the
overlap. I can clean it up and contribute it separately if useful.

Origin

This issue surfaced while debugging lsp-ltex-plus (an lsp-mode client for
the ltex-ls-plus grammar/spell server). ltex-ls-plus initiates frequent
server-side requests during interactive editing, which made the collision
easy to hit and easy to reproduce. (LSP 3.17 inherits JSON-RPC 2.0's
message shapes unchanged, so the fix applies to every LSP server, not just
this one.) To unblock users while a proper fix is
discussed upstream, lsp-ltex-plus has been shipping the patch as a local
override of lsp--parser-on-message since release v0.1.0
(https://github.com/alberti42/emacs-ltex-plus/releases). It is mentioned
here only as context — the bug exists in lsp-mode independently of any
particular client, and the fix belongs upstream so other clients benefit
without each having to ship its own override.

Related work

This PR is the first of three independent fixes to the same general area
(message dispatch under high request volume / interactive completion). The
other two will follow as separate PRs:

  • PR 2 — guard lsp-request-while-no-input so a stale callback that
    arrives after the synchronous wait has unwound does not throw to a tag that
    no longer exists.
  • PR 3 — make lsp--create-filter-function resilient against a single
    handler that exits non-locally: catch per-message instead of per-batch, so
    one throwing handler cannot abandon the rest of the parsed batch.

Each PR addresses a distinct invariant and is independently valuable; they
are split out to keep review focused.

Per JSON-RPC 2.0, the presence of a `method' field unambiguously
identifies a message as either a request (with id) or a notification
(without id).  Responses are exclusively id-bearing messages without a
method.

The previous parser classified messages id-first (via
`lsp--get-message-type'), which fails when a server-initiated request's
id happens to collide with the id of a pending client request: the
parser routes the server's request as a response, looks up the wrong
handler, fires it with bogus arguments, and never answers the server.
The actual request is silently dropped.

The misclassification is amplified by `lsp:json-response-id', the only
id-extraction path through `lsp--get-message-type'.  It is designed for
response-shaped messages and signals an error when given a request
(where id is at the top level rather than nested inside `result' /
`error').  The error is swallowed by the surrounding
`with-demoted-errors', so the whole message disappears with no
diagnostic.

Symptom: the LTeX+ language server (and any server that initiates
`window/workDoneProgress/create' or `workspace/configuration' requests
shortly after a client request) hangs waiting indefinitely for an
acknowledgment that lsp-mode never sends.  Particularly visible with
high-traffic clients (Corfu auto-completion at short delay).

This commit:

- Classifies each message by checking `method' first.  If present, the
message is a request (with id) or a notification (without id).  Only
fall back to id-based classification for actual responses.
- Replaces `lsp:json-response-id' with a representation-agnostic helper
that reads the top-level id directly, so request ids are extracted
from request shapes correctly.
- Reads `method' and `id' via the same helper, so routing works
regardless of `lsp-use-plists'.

The `with-demoted-errors' wrap is preserved to keep the original
"a single bad message doesn't crash the parser" robustness; the
kind-first reclassification removes the most common cause of demoted
errors.
@jcs090218 jcs090218 merged commit fbc926f into emacs-lsp:master May 11, 2026
10 of 34 checks passed
@alberti42
Copy link
Copy Markdown
Contributor Author

Thanks !! This was the most important fix, without which https://github.com/alberti42/emacs-ltex-plus could not run at all.

@alberti42 alberti42 deleted the fix/lsp--parser-on-message branch May 11, 2026 16:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants