Skip to content

fix: MIME Decoding corrupts non-ASCII characters in Base64-encoded words#2291

Open
williballenthin wants to merge 1 commit intogchq:masterfrom
williballenthin:fix-2280
Open

fix: MIME Decoding corrupts non-ASCII characters in Base64-encoded words#2291
williballenthin wants to merge 1 commit intogchq:masterfrom
williballenthin:fix-2280

Conversation

@williballenthin
Copy link
Copy Markdown

fromBase64() defaults to returning a UTF-8 decoded string, which is then passed to codepage.utils.decode() that treats each char code as a raw byte.
For multi-byte UTF-8 characters, this double-decoding produces garbage (e.g. "café" becomes "caf退").

Pass returnType="byteArray" so codepage receives raw bytes and performs the single correct UTF-8 decode.

Closes #2280

AI disclosure
Claude Code Opus 4.6

fromBase64() defaults to returning a UTF-8 decoded string, which is then
passed to codepage.utils.decode() that treats each char code as a raw
byte.
For multi-byte UTF-8 characters, this double-decoding produces garbage
(e.g. "café" becomes "caf退").

Pass returnType="byteArray" so codepage receives raw bytes and performs
the single correct UTF-8 decode.

Closes gchq#2280
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug report: MIME Decoding operation corrupts non-ASCII characters in Base64-encoded words

2 participants