Skip to content

Blob/File type: preserve user-supplied MIME verbatim (WHATWG File API)#29258

Open
robobun wants to merge 11 commits intomainfrom
farm/c4850fc3/fix-blob-type-charset-append
Open

Blob/File type: preserve user-supplied MIME verbatim (WHATWG File API)#29258
robobun wants to merge 11 commits intomainfrom
farm/c4850fc3/fix-blob-type-charset-append

Conversation

@robobun
Copy link
Copy Markdown
Collaborator

@robobun robobun commented Apr 13, 2026

What

Fixes new File([], "empty.txt", { type: "text/plain" }).type returning "text/plain;charset=utf-8". Per the WHATWG File API, the type attribute must be the lowercased input verbatim — user agents "must NOT append a charset parameter to the media type".

// before
new File([], "x.txt", { type: "text/plain" }).type  // "text/plain;charset=utf-8"
// after (and matching node.js)
new File([], "x.txt", { type: "text/plain" }).type  // "text/plain"

Cause

Blob/File/S3File construction paths were using globalThis.bunVM().mimeType(slice) as a string-interning optimisation — but that helper routes through MimeType.Compact.toMimeType(), which explicitly substitutes a handful of bare enum values with canonical charset-appended constants:

// src/http/MimeType.zig
switch (self.value) {
    .@"text/plain" => return text,   // text = "text/plain;charset=utf-8"
    .@"text/css"   => return css,    // css  = "text/css;charset=utf-8"
    .@"text/html"  => return html,
    .@"application/json" => return json,
    ...
}

The substitution is correct for HTTP serving (byExtension / byLoader), but wrong for Blob/File type which must preserve what the user wrote.

Fix

Adds rareData.mimeTypeInternedValue(str) that reuses the same MIME hash table but returns the raw enum slice() — a static substring of the mime-type bytes blob. This keeps the zero-allocation interning win for every known MIME without going through the Compact.toMimeType substitution.

All eight Blob/File/S3File construction paths now use it:

  1. JSDOMFile__construct_ (new File(...))
  2. constructBunFile (Bun.file(path, { type }))
  3. doWrite local file branch (blob.write(path, { type }))
  4. doWrite S3 branch
  5. Blob.prototype.slice(type)
  6. new Blob(..., { type })
  7. constructS3FileWithS3CredentialsAndOptions (Bun.s3.file(path, { type }))
  8. constructS3FileWithS3Credentials

HTTP serving paths (byExtension, byLoader) are untouched, so Content-Type: text/plain;charset=utf-8 headers on HTTP responses still work as before.

Adjacent fixes:

  • doWrite cleared content_type_allocated after free() but didn't reset the flag, leaving it true while pointing at a static string on the next interned assignment.
  • getSliceFrom was computing content_type_was_set as this.content_type_was_set or content_type_was_allocated, but the interned path takes a static string (content_type_was_allocated = false). A .slice(start, end, type) on a parent blob without its own type would end up with content_type_was_set = false despite a valid type, so HTTP serving of the slice fell back to the store MIME. Fixed to check the content_type length directly.

Verification

New regression test in test/regression/issue/29257.test.ts covering File constructor, Blob constructor, Bun.file, Bun.s3.file, .slice(), explicit-charset round-trip, uppercase lowercasing, and uncommon MIME types (both File and Blob construction paths).

Pre-existing tests updated where they had codified the buggy canonicalisation for type: "text/plain""text/plain;charset=utf-8":

  • test/js/web/structured-clone-blob-file.test.ts (8 assertions)
  • test/js/bun/util/inspect.test.js (6 assertions)

All Blob/File/structuredClone/inspect/HTTP-serve/fetch data-URL tests pass locally with bun bd test.

Fixes #29257
Fixes #15078

The Blob/File constructor paths were routing user-supplied MIME strings
through MimeType.Compact.toMimeType(), which substitutes bare forms
like `text/plain` with canonical charset-appended constants
(`text/plain;charset=utf-8`, etc.). Per the WHATWG File API, the
`type` attribute must be the lowercased input verbatim.

Adds a rare-data helper `mimeTypeInternedValue` that reuses the
existing hash table for the interning optimisation, but returns the
raw enum `slice()` (a static substring of the mime-type bytes blob)
instead of routing through `Compact.toMimeType` — so `text/plain` stays
`text/plain`, while preserving zero-alloc hits for every known MIME.

Updates all six Blob/File construction paths in Blob.zig (File
constructor, Blob constructor, Bun.file, Blob#slice, Bun.write local +
S3). HTTP serving (byExtension / byLoader) is unchanged, so
`Content-Type: text/plain;charset=utf-8` on HTTP responses still works.

Fixes #29257
@robobun
Copy link
Copy Markdown
Collaborator Author

robobun commented Apr 13, 2026

Updated 12:45 PM PT - Apr 13th, 2026

@robobun, your commit c7d8d0d is building: #45537

@github-actions
Copy link
Copy Markdown
Contributor

Found 2 issues this PR may fix:

  1. Normalizing blob type with charset causing bug on type validation #15078 - Reports new Blob(…, { type: 'text/plain' }).type returns "text/plain;charset=utf-8" instead of "text/plain" — the exact charset-appending behavior removed by this PR
  2. different response blob type from node #19603 - Reports (await res.blob()).type gets ";charset=utf-8" appended (e.g. "application/json;charset=utf-8") when it shouldn't — same root cause fixed by this PR

If this is helpful, copy the block below into the PR description to auto-close these issues on merge.

Fixes #15078
Fixes #19603

🤖 Generated with Claude Code

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 13, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Added a VM API to return interned MIME string values and switched Blob/File construction, slicing, S3 file creation, and related write/getWriter paths to use it so stored content_type preserves the lowercased input verbatim (no charset canonicalization). Tests were updated and a regression test added.

Changes

Cohort / File(s) Summary
MIME Interning API
src/bun.js/VirtualMachine.zig, src/bun.js/rare_data.zig
Added mimeTypeInternedValue(...) on VirtualMachine and RareData to lazily init the mime table and return the interned raw slice on hit (verbatim; no charset/canonicalization).
Blob/File construction and slicing
src/bun.js/webcore/Blob.zig
Replaced canonicalizing mimeType(...) calls with vm.mimeTypeInternedValue(...) across File/Blob constructors, write/getWriter flows, .slice handling, and content_type propagation; adjusted allocation flags and slice metadata to reflect interned vs heap-allocated values.
S3-backed file construction
src/bun.js/webcore/S3File.zig
Switched S3File content_type assignment to use globalObject.bunVM().mimeTypeInternedValue(slice) instead of mimeType(...), preserving the provided lowercased MIME string when interned.
Tests — expectations updated
test/js/bun/util/inspect.test.js, test/js/web/structured-clone-blob-file.test.ts
Updated assertions to expect preserved MIME strings (e.g., text/plain, text/html) instead of charset-appended canonical forms.
Regression test added
test/regression/issue/29257.test.ts
Added regression tests validating that File/Blob constructors, Bun.file, Bun.s3.file, and Blob.prototype.slice preserve the provided type verbatim (lowercased; no charset appended unless explicitly present).
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: preserving user-supplied MIME types verbatim in Blob/File per WHATWG spec, directly addressing the core issue.
Description check ✅ Passed The description comprehensively covers both required sections: 'What' explains the bug fix with before/after code, and 'How did you verify' details the regression test and updated existing tests.
Linked Issues check ✅ Passed All code changes fully address both linked issues: #29257 and #15078 both require preserving user-supplied MIME without charset appending, which the PR implements across all eight construction paths for File, Blob, and S3File.
Out of Scope Changes check ✅ Passed All changes directly support the objective: new mimeTypeInternedValue helper, eight construction paths updated to use it, adjacent bugfixes (content_type_allocated clearing, content_type_was_set logic), and test updates reflecting the corrected behavior.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

Warning

Review ran into problems

🔥 Problems

Timed out fetching pipeline failures after 30000ms


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/bun.js/webcore/Blob.zig (1)

1924-1943: ⚠️ Potential issue | 🟠 Major

Use get() instead of getTruthy() for the type option in file-backed paths to preserve explicit empty-string overrides.

Three paths still use getTruthy() to read the type option (constructBunFile at line 1924, doWrite at line 2320, and getWriter at line 2666), which skips empty strings as falsy values. This means Bun.file("x.txt", { type: "" }) never sets content_type_was_set, so downstream getters (getType() and contentTypeOrMimeType()) fall back to the store MIME type whenever content_type.len == 0. The Blob constructor correctly uses get() at line 1821 to handle this case, but the file-backed paths do not, creating an inconsistency. Switch these three to get() and make content_type_was_set authoritative to preserve the user-supplied empty string.

Also applies to: 2320-2343, 2666-2689

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/bun.js/webcore/Blob.zig` around lines 1924 - 1943, The file-backed paths
(constructBunFile, doWrite, and getWriter) currently use opts.getTruthy("type")
which ignores explicit empty-string overrides; change each to use
opts.get("type") so an explicit "" is preserved, and ensure you set
blob.content_type_was_set = true when a type is present (even if empty) so
downstream getters (getType(), contentTypeOrMimeType()) treat the empty string
as authoritative; keep the existing logic that lowercases/copies non-ASCII-safe
values and sets content_type_allocated when allocating a buffer.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/regression/issue/29257.test.ts`:
- Around line 57-62: The test only checks the File constructor path and misses
the Blob constructor path that uses the copyLowercase fallback; add an assertion
that constructs a Blob with type "application/x-custom-type" and asserts
blob.type === "application/x-custom-type" so regressions in the Blob constructor
(implemented in src/bun.js/webcore/Blob.zig and the copyLowercase path) are
covered alongside the existing File check.

---

Outside diff comments:
In `@src/bun.js/webcore/Blob.zig`:
- Around line 1924-1943: The file-backed paths (constructBunFile, doWrite, and
getWriter) currently use opts.getTruthy("type") which ignores explicit
empty-string overrides; change each to use opts.get("type") so an explicit "" is
preserved, and ensure you set blob.content_type_was_set = true when a type is
present (even if empty) so downstream getters (getType(),
contentTypeOrMimeType()) treat the empty string as authoritative; keep the
existing logic that lowercases/copies non-ASCII-safe values and sets
content_type_allocated when allocating a buffer.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: f51b1ce5-ce3c-4f81-af64-6c8ce608df31

📥 Commits

Reviewing files that changed from the base of the PR and between d35774f and 27153ec.

📒 Files selected for processing (6)
  • src/bun.js/VirtualMachine.zig
  • src/bun.js/rare_data.zig
  • src/bun.js/webcore/Blob.zig
  • test/js/bun/util/inspect.test.js
  • test/js/web/structured-clone-blob-file.test.ts
  • test/regression/issue/29257.test.ts

Addresses review feedback on PR #29258:

- S3File.zig: two more construction sites (`Bun.s3.file(path, {type})`)
  were using the old `bunVM().mimeType()` helper, so S3-backed blobs with
  well-known MIMEs were still getting `;charset=utf-8` appended. Switch
  them to `mimeTypeInternedValue` like the other Blob paths.

- Blob.zig `getSliceFrom`: `content_type_was_set` was computed as
  `this.content_type_was_set or content_type_was_allocated`, but the
  interned path takes a static string (`content_type_was_allocated = false`).
  A `.slice(start, end, type)` on a blob without a parent type would end
  up with `content_type_was_set = false` despite having a valid type,
  and HTTP serving of the slice would fall back to the store MIME.
  Check the string length directly instead.

- Revert the `dupeWithContentType` edit from the initial commit: that
  block is unreachable (`setNotHeapAllocated` zeros the ref count, so
  the subsequent `duped.isHeapAllocated()` guard is permanently false),
  so the change had no runtime effect. Left a comment documenting the
  pre-existing dead code; fixing the guard is out of scope.

- Regression test: add `Bun.s3.file` coverage, and extend the uncommon
  MIME type test to check both the File and Blob constructor paths
  (per CodeRabbit feedback).
@robobun
Copy link
Copy Markdown
Collaborator Author

robobun commented Apr 13, 2026

CodeRabbit outside-diff Finding (get vs getTruthy for options.type) — declined: observationally identical to get() on the JS-visible surface.

The File constructor does use get() but then at Blob.zig:1856 clears content_type_was_set = false when content_type.len == 0, so {type: ""} and {type: undefined} produce the same .type === "". Same reset at Blob.zig:3370 in the Blob constructor. Unifying the file-backed paths on get() would distinguish "explicit empty string" from "no type" internally but produce no JS-visible difference, and the WHATWG File API spec does not require distinguishing them. Scope creep for a charset-canonicalization fix; leaving as-is.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/bun.js/webcore/Blob.zig`:
- Around line 2857-2863: The bug sets blob.content_type_was_set by checking
blob.content_type.len which treats an inherited/store MIME as if the slice
caller provided it; instead, determine whether the slice call passed an explicit
content-type argument (the slice's argument, e.g., content_type or type) and use
that to set blob.content_type_was_set (combine this.content_type_was_set with a
check on the slice argument being non-empty) so blob.content_type_was_set only
becomes true when the slice caller actually specified a type; update the
assignment that currently references blob.content_type.len and use the
slice-argument variable and this.content_type_was_set to compute the value.

In `@test/regression/issue/29257.test.ts`:
- Around line 57-80: Add a regression test that exercises
Blob.prototype.slice(type) to ensure the provided type is preserved verbatim
(and hits the getSlice/getSliceFrom paths in Blob.zig); create a Blob (e.g., new
Blob(["a"], { type: "text/plain" }) or empty data), call blob.slice(0, 1,
"text/plain") and assert that the resulting slicedBlob.type === "text/plain" to
prevent regressions in the interned-type path touched by getSlice/getSliceFrom.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: a0336ffa-7545-47ba-a073-faf15eefaf9c

📥 Commits

Reviewing files that changed from the base of the PR and between 27153ec and fc3cf56.

📒 Files selected for processing (3)
  • src/bun.js/webcore/Blob.zig
  • src/bun.js/webcore/S3File.zig
  • test/regression/issue/29257.test.ts

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
src/bun.js/webcore/Blob.zig (1)

2857-2863: ⚠️ Potential issue | 🟡 Minor

Don't derive content_type_was_set from the final blob buffer.

The condition blob.content_type.len > 0 incorrectly marks inherited store MIME types as user-specified. When .slice() is called without a type argument:

  1. content_type parameter is ""
  2. Lines 2851-2855 copy the parent's store-derived content_type to blob.content_type
  3. Line 2863 sets content_type_was_set = true because blob.content_type.len > 0

This should key off the function's content_type argument (the user-supplied value) instead.

Suggested fix
-    blob.content_type_was_set = this.content_type_was_set or blob.content_type.len > 0;
+    blob.content_type_was_set = this.content_type_was_set or content_type.len > 0;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/bun.js/webcore/Blob.zig` around lines 2857 - 2863, The bug is that
content_type_was_set is being derived from blob.content_type.len (which includes
inherited store MIME types) instead of the user-supplied function argument;
change the assignment for blob.content_type_was_set to use the slice function's
content_type parameter (e.g., set blob.content_type_was_set =
this.content_type_was_set or content_type.len > 0) so only an explicitly
provided type marks the flag, referencing the existing symbols
blob.content_type_was_set, this.content_type_was_set, blob.content_type.len and
the function's content_type parameter.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/bun.js/webcore/Blob.zig`:
- Around line 2857-2863: The bug is that content_type_was_set is being derived
from blob.content_type.len (which includes inherited store MIME types) instead
of the user-supplied function argument; change the assignment for
blob.content_type_was_set to use the slice function's content_type parameter
(e.g., set blob.content_type_was_set = this.content_type_was_set or
content_type.len > 0) so only an explicitly provided type marks the flag,
referencing the existing symbols blob.content_type_was_set,
this.content_type_was_set, blob.content_type.len and the function's content_type
parameter.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 9e80d9f2-74fa-448b-9d54-3ef9675a2937

📥 Commits

Reviewing files that changed from the base of the PR and between fc3cf56 and a75bcd4.

📒 Files selected for processing (1)
  • src/bun.js/webcore/Blob.zig

CodeRabbit caught that my earlier fix checked `blob.content_type.len > 0`
after the parent-inherit step, so a plain `.slice(0, 5)` with no type
argument would copy the parent content_type into `blob.content_type`
and then incorrectly mark the slice as user-specified. Key off the
`content_type` argument to `getSliceFrom` directly.

Also adds a `Blob.prototype.slice(start, end, type)` regression test.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/bun.js/webcore/Blob.zig (2)

1832-1841: 🧹 Nitpick | 🔵 Trivial

Extract MIME normalization into one helper.

The same ASCII check -> mimeTypeInternedValue() -> copyLowercase() sequence now exists in six call sites. This PR already had to touch every copy; the next File API tweak is very likely to miss one path and reintroduce drift between File, Blob, Bun.file, writer overrides, and slice(). Please centralize it behind one helper and reuse it everywhere.

Also applies to: 1935-1943, 2335-2343, 2681-2689, 2946-2955, 3376-3384

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/bun.js/webcore/Blob.zig` around lines 1832 - 1841, Extract the repeated
ASCII-check + bunVM().mimeTypeInternedValue(slice) + strings.copyLowercase(...)
logic into a single helper (e.g., normalizeMimeType or getNormalizedMimeType)
that returns a struct or tuple with the chosen interned pointer or an allocated
lowercase copy and a flag indicating ownership; update all call sites (the
Blob.path handling where blob.content_type is set: the block using
bunVM().mimeTypeInternedValue(slice), and the other occurrences around the
ranges noted) to call this helper and assign blob.content_type and
blob.content_type_allocated based on the helper's result; ensure the helper uses
the same allocator and bun.handleOom(...) behavior and preserves the exact
semantics (prefer interned if present, otherwise allocate & copy lowercase) and
update callers to free when blob.content_type_allocated is true.

2850-2863: ⚠️ Potential issue | 🟠 Major

Make no-argument slice() independent of MIME interning.

var blob = this.dupe() already starts with the parent's content_type, but Lines 2851-2856 then clear it whenever the parent happened to use an allocated buffer. After this PR, new Blob([], { type: "text/plain" }).slice().type keeps "text/plain", while new Blob([], { type: "application/x-custom-type" }).slice().type becomes "" only because the first type hit mimeTypeInternedValue() and the second took copyLowercase(). The uncommon path also leaves content_type_was_set true at Line 2863, so internal MIME consumers can diverge from the JS-visible .type. Preserve or clear the parent type uniformly instead of keying it off content_type_allocated.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/bun.js/webcore/Blob.zig` around lines 2850 - 2863, The slice() logic
should not clear the parent's MIME when the parent used an allocated buffer;
update the branch in Blob.slice so that if content_type.len == 0 and
this.content_type.len > 0 you always set blob.content_type = this.content_type
and also set blob.content_type_allocated = this.content_type_allocated (instead
of basing it on content_type_was_allocated); otherwise keep the existing
behavior of assigning blob.content_type = content_type and
blob.content_type_allocated = content_type_was_allocated; leave the
blob.content_type_was_set assignment as-is (this.content_type_was_set or
content_type.len > 0).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/bun.js/webcore/Blob.zig`:
- Around line 1832-1841: Extract the repeated ASCII-check +
bunVM().mimeTypeInternedValue(slice) + strings.copyLowercase(...) logic into a
single helper (e.g., normalizeMimeType or getNormalizedMimeType) that returns a
struct or tuple with the chosen interned pointer or an allocated lowercase copy
and a flag indicating ownership; update all call sites (the Blob.path handling
where blob.content_type is set: the block using
bunVM().mimeTypeInternedValue(slice), and the other occurrences around the
ranges noted) to call this helper and assign blob.content_type and
blob.content_type_allocated based on the helper's result; ensure the helper uses
the same allocator and bun.handleOom(...) behavior and preserves the exact
semantics (prefer interned if present, otherwise allocate & copy lowercase) and
update callers to free when blob.content_type_allocated is true.
- Around line 2850-2863: The slice() logic should not clear the parent's MIME
when the parent used an allocated buffer; update the branch in Blob.slice so
that if content_type.len == 0 and this.content_type.len > 0 you always set
blob.content_type = this.content_type and also set blob.content_type_allocated =
this.content_type_allocated (instead of basing it on
content_type_was_allocated); otherwise keep the existing behavior of assigning
blob.content_type = content_type and blob.content_type_allocated =
content_type_was_allocated; leave the blob.content_type_was_set assignment as-is
(this.content_type_was_set or content_type.len > 0).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 91d24014-93e1-4db5-b061-a75a0c7ae774

📥 Commits

Reviewing files that changed from the base of the PR and between a75bcd4 and 3b67da7.

📒 Files selected for processing (2)
  • src/bun.js/webcore/Blob.zig
  • test/regression/issue/29257.test.ts

@robobun
Copy link
Copy Markdown
Collaborator Author

robobun commented Apr 13, 2026

CodeRabbit outside-diff findings triage:

Finding 1 — "Extract MIME normalization into one helper": nitpick, declined. The 6 call sites are similar but not identical (different allocator variables, different existing inner: block structures). A helper is a good follow-up refactor but adds diff noise to a bug fix and is not required to land the WHATWG compliance.

Finding 2 — "Make no-argument slice() independent of MIME interning": real but pre-existing. Repro against unmodified main:

new Blob(["x"], { type: "application/x-custom-type" }).slice().type  // ""

The cause is getSliceFrom's !this.content_type_allocated guard at Blob.zig:2851, which has been wrong forever. Fixing it correctly requires either duplicating the content_type on inherit (per-slice allocation) or refcounting the content_type buffer — out of scope for the charset canonicalization fix. Will track as a separate follow-up.

robobun added 2 commits April 13, 2026 14:05
s3.test.ts 'inspect' asserted the buggy canonicalized form
(`type: "text/plain;charset=utf-8"`) for an S3File constructed with
`{ type: "text/plain" }`. My fix preserves the user-supplied MIME
verbatim, so update the snapshot assertion to match.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/bun.js/webcore/Blob.zig`:
- Around line 1832-1836: Constructors in Blob.zig that set blob.content_type to
an interned slice (see the branches using
globalThis.bunVM().mimeTypeInternedValue(slice) around the blob initialization
code) must also clear the blob.content_type_allocated flag; currently those
branches (around the occurrences at the interned-handling lines) replace
content_type with a static interned value but leave content_type_allocated true,
causing calculateEstimatedByteSize() and later free/overwrite paths to treat
static data as owned. Fix by setting content_type_allocated = false in every
branch where content_type is assigned from an interned value (the same places
you set blob.content_type = interned) so the Blob state correctly reflects
non-owned MIME slices.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 59b52b6d-9429-443e-814b-cf4932745b9b

📥 Commits

Reviewing files that changed from the base of the PR and between 3b67da7 and 867e66b.

📒 Files selected for processing (2)
  • src/bun.js/webcore/Blob.zig
  • test/js/bun/s3/s3.test.ts

CodeRabbit caught that in the File/Blob/Bun.file constructor paths
where `blob` may have come from `get()` → `dupe()`, the shallow copy
brings along `content_type_allocated = true` pointing at an aliased
parent buffer. Reassigning `blob.content_type` to a static interned
slice left the stale flag, so `calculateEstimatedByteSize()` counted
static bytes as owned. Reset the flag before the overwrite.

Affects three sites: `JSDOMFile__construct_` (File constructor),
`constructBunFile` (Bun.file), and `Blob.constructor`. The doWrite /
getWriter / S3File paths already free-then-reset, and the slice path
uses a local variable, so they are unchanged.
Copy link
Copy Markdown
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


Additional findings (outside current diff — PR may have been updated during review):

  • 🟣 src/bun.js/webcore/Blob.zig:2854-2866 — Pre-existing inconsistency in getSliceFrom (line 2851): the \!this.content_type_allocated guard intentionally prevents inheriting a heap-allocated parent MIME type when .slice() is called without a type argument, so new Blob(["x"], {type:"application/x-custom"}).slice(0).type returns "" while new Blob(["x"], {type:"text/plain"}).slice(0).type returns "text/plain" (static/interned, so inheritance fires). The PR modifies line 2863 in this same function but does not fix or worsen the guard; the new regression tests cover .slice(start, end, type) with an explicit type but not the no-type-arg case on a heap-allocated custom MIME, leaving this gap uncovered.

    Extended reasoning...

    What the bug is and how it manifests

    In getSliceFrom (Blob.zig line 2851), there is a guard:

    if (content_type.len == 0 and this.content_type.len > 0 and \!this.content_type_allocated) {
        blob.content_type = this.content_type;  // inherit from parent
    } else {
        blob.content_type = content_type;       // empty string when len==0
    }
    

    This guard intentionally prevents inheriting a heap-allocated (custom) content_type from the parent blob to avoid a double-free risk. The result is two different behaviours depending on whether the parent MIME was interned (static) or heap-allocated:

    • new Blob(["x"], {type: "text/plain"}).slice(0).type returns "text/plain" (static interned — inheritance fires, WHATWG violation)
    • new Blob(["x"], {type: "application/x-custom"}).slice(0).type returns "" (heap-allocated — guard blocks, spec-compliant per WHATWG)

    Addressing the refutation

    One verifier argues the "" return is spec-correct: the WHATWG File API slice algorithm says the result type is "" when no contentType argument is supplied — no inheritance at all. This is a fair point. The actual spec violation is in the other branch: the static/interned path over-eagerly inherits the parent type when no argument is provided. Three independent verifiers still classified this as pre-existing at pre_existing severity, which aligns: "" for heap types is correct, inheriting for static types is the violation.

    Why existing code does not prevent it

    The guard exists because fixing it properly (duplicating or refcounting the heap-allocated string on inherit) requires allocator work that is out of scope for a charset-canonicalization fix. The PR author explicitly acknowledged this in the timeline: "will track as a separate follow-up."

    What the impact is

    The asymmetry is observable: slicing a blob with an interned MIME and no type argument propagates the parent type, while slicing a blob with a custom MIME returns "". The spec says both should return "".

    How to fix it

    The correct fix is to remove the inheritance branch entirely (all no-type-arg slices return "" regardless of parent type) and provide a separate helper for callers that want to clone the type. This is deferred.

    Step-by-step proof

    1. const b = new Blob(["hello"], {type: "application/x-custom"}) — content_type_allocated = true, content_type = "application/x-custom".
    2. const s = b.slice(0) — getSlice called, no type arg, content_type = "", content_type_was_allocated = false.
    3. In getSliceFrom: content_type.len == 0 true, this.content_type.len > 0 true, !this.content_type_allocated false — else branch fires: blob.content_type = "".
    4. s.type returns "".
    5. Contrast: new Blob(["hello"], {type: "text/plain"}).slice(0).type returns "text/plain" because content_type_allocated = false (interned static), so the if-branch fires and inherits.

    Coverage gap

    The PR new regression tests in 29257.test.ts cover .slice(start, end, type) with an explicit type argument but do not cover .slice() without a type arg on a parent with a heap-allocated custom MIME. This leaves the asymmetric behaviour uncovered.

…alization

elysia 1.4.12 has `test/path/path.test.ts` which asserts that
`new File([...], "x", { type: "text/plain" })` served via
`Response(file)` yields `Content-Type: text/plain;charset=utf-8`.
Per WHATWG Fetch / File API this is wrong: the response header
extracts the Blob type verbatim, and after #29257 Bun preserves
the user-supplied MIME without charset canonicalization.

Skip the file until elysia updates the assertion upstream.
robobun added 2 commits April 13, 2026 17:16
The runner calls readdirSync(vendor/elysia/test), which yields filenames
relative to that directory (e.g. path/path.test.ts) without the
redundant test/ prefix. The previous key test/path/path.test.ts never
matched, so the skip was inert and CI still ran the failing test.
readdirSync on Windows returns paths with backslash separators, but
vendor.json skipTests keys use forward-slash globs. The regex
`^path/path.test.ts$` never matched `path\path.test.ts` on Windows,
so the elysia test that asserts pre-WHATWG File.type canonicalization
still ran on 2019-x64, 2019-x64-baseline, and 11-aarch64.

Normalize the readdir path to forward slashes before regex test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

File.type incorrectly appends charset parameter (e.g. "text/plain;charset=utf-8") Normalizing blob type with charset causing bug on type validation

1 participant