fix: use StringDecoder to handle UTF-8 chunk boundaries in setEncoding by 398651434 · Pull Request #5035 · nodejs/undici

398651434 · 2026-04-16T04:13:20Z

Description

Fixes a bug where response.body.setEncoding('utf8') corrupts multi-byte UTF-8 characters that span chunk boundaries.

Root Cause

Each chunk was being individually converted to a string via buffer.utf8Slice() (or toString()). When a multi-byte UTF-8 character (e.g., a Chinese character = 3 bytes) is split across two HTTP response chunks, the first chunk gets an incomplete byte sequence converted to garbage, and the second chunk's portion becomes a separate corrupted character.

Fix

Use Node.js's built-in StringDecoder (from node:string_decoder) which properly buffers incomplete byte sequences between write() calls:

setEncoding(encoding): Initialize a StringDecoder when encoding is set
consumePush: When a decoder exists, use decoder.write(chunk) instead of storing the raw buffer — this accumulates incomplete UTF-8 bytes internally
consumeFinish: Reset the decoder to allow garbage collection

Testing

The bug manifests when:

HTTP response contains multi-byte UTF-8 text (e.g., Chinese characters, emoji)
setEncoding('utf8') is called on the body
The text spans multiple TCP packets/chunks

After fix, characters are correctly reassembled across chunk boundaries.

Closes #5002

When setEncoding('utf8') is called, each chunk was being converted to a string individually, which corrupts multi-byte UTF-8 characters that span chunk boundaries. This fix: - Initializes a StringDecoder when setEncoding is called - Uses StringDecoder.write() in consumePush to properly handle incomplete UTF-8 sequences at chunk boundaries - Resets the decoder in consumeFinish to allow garbage collection Closes nodejs#5002

metcoder95

can you add a regression for it?

codecov-commenter · 2026-04-24T09:35:38Z

Codecov Report

❌ Patch coverage is 58.33333% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 93.03%. Comparing base (bc0a19c) to head (f788349).
⚠️ Report is 31 commits behind head on main.

Files with missing lines	Patch %	Lines
lib/api/readable.js	58.33%	5 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #5035      +/-   ##
==========================================
- Coverage   93.03%   93.03%   -0.01%     
==========================================
  Files         110      110              
  Lines       35793    35803      +10     
==========================================
+ Hits        33301    33309       +8     
- Misses       2492     2494       +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

metcoder95 reviewed Apr 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: use StringDecoder to handle UTF-8 chunk boundaries in setEncoding#5035

fix: use StringDecoder to handle UTF-8 chunk boundaries in setEncoding#5035
398651434 wants to merge 1 commit intonodejs:mainfrom
398651434:main

398651434 commented Apr 16, 2026

Uh oh!

metcoder95 left a comment

Uh oh!

codecov-commenter commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

398651434 commented Apr 16, 2026

Description

Root Cause

Fix

Testing

Uh oh!

metcoder95 left a comment

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Apr 24, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants