fix: MIME Decoding corrupts non-ASCII characters in Base64-encoded words#2291
Open
williballenthin wants to merge 1 commit intogchq:masterfrom
Open
fix: MIME Decoding corrupts non-ASCII characters in Base64-encoded words#2291williballenthin wants to merge 1 commit intogchq:masterfrom
williballenthin wants to merge 1 commit intogchq:masterfrom
Conversation
fromBase64() defaults to returning a UTF-8 decoded string, which is then passed to codepage.utils.decode() that treats each char code as a raw byte. For multi-byte UTF-8 characters, this double-decoding produces garbage (e.g. "café" becomes "caf退"). Pass returnType="byteArray" so codepage receives raw bytes and performs the single correct UTF-8 decode. Closes gchq#2280
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
fromBase64() defaults to returning a UTF-8 decoded string, which is then passed to codepage.utils.decode() that treats each char code as a raw byte.
For multi-byte UTF-8 characters, this double-decoding produces garbage (e.g. "café" becomes "caf退").
Pass returnType="byteArray" so codepage receives raw bytes and performs the single correct UTF-8 decode.
Closes #2280
AI disclosure
Claude Code Opus 4.6