Commit 1225a3f
📝 fix: Preserve Raw Markdown Formatting on Upload as Text (danny-avila#12734)
* 🐛 fix: Preserve Raw Markdown on `Upload as Text`
When `RAG_API_URL` is configured, `.md` uploads were sent to the RAG API
`/text` endpoint, which routes Markdown through `UnstructuredMarkdownLoader`
and strips formatting (`#`, `**`, lists, blockquotes). Users expect `Upload
as Text` to preserve raw content - identical bytes in a `.txt` file round-trip
verbatim, while the `.md` came back stripped.
Short-circuit the RAG API call for Markdown files (by MIME type or `.md` /
`.markdown` extension) and read the file verbatim via `parseTextNative`.
Non-Markdown paths are unaffected, and the embedding path (`/embed`) keeps
its existing loader so vector search quality is unchanged.
* 🐛 fix: normalize markdown MIME and accept `text/md`
Addressing review feedback on the `Upload as Text` short-circuit:
- Accept `text/md` in the markdown MIME set (LibreChat treats it as a
valid markdown type elsewhere, e.g. the artifact-rendering prompt).
- Normalize the incoming MIME type (lowercase + strip parameters) before
the set lookup so parameterized values like
`text/markdown; charset=utf-8` and uppercase `TEXT/MARKDOWN` still
short-circuit. Extensionless uploads relying only on the `Content-Type`
header would otherwise fall through to the RAG `/text` endpoint and
lose their markdown formatting.
Extend `text.spec.ts` parametrized cases with `text/md`, parameterized
MIME, uppercase, and whitespace-padded variants.
* 🧹 chore: Address Code Review Follow-ups on `Upload as Text` fix
Addressing comprehensive review feedback:
- Debug log now includes filename and MIME type so operators can
identify which upload triggered the short-circuit without having
to correlate other logs.
- Expand markdown extension detection beyond `.md` / `.markdown` to
cover `.mdown`, `.mkdn`, `.mkd`, `.mdwn` (case-insensitive regex).
- Tighten `normalizeMimeType` parameter type from `string | undefined`
to `string` to match the actual Express.Multer.File type. The
falsy-check still protects against empty strings at runtime.
- Extend parametrized tests with the most common real-world shapes:
`text/plain` + `.md` (the MIME most browsers/servers assign),
the new rare extensions, and empty MIME + `.md` (pure extension
fallback path).
- Add a positive assertion that `readFileAsString` was called with the
expected arguments on every short-circuit case, so tests fail loudly
if the native-parse path ever regresses.
* 🧪 test: Cover `.mdwn` regex branch in Markdown short-circuit
Every other alternation in `MARKDOWN_EXTENSIONS_RE` has at least one
test case (`md`, `markdown`, `mdown`, `mkdn`, `mkd`) but `mdwn` did
not, leaving a typo in that branch undetectable.1 parent 034f2ef commit 1225a3f
2 files changed
Lines changed: 101 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
300 | 300 | | |
301 | 301 | | |
302 | 302 | | |
| 303 | + | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
| 310 | + | |
| 311 | + | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
| 315 | + | |
| 316 | + | |
| 317 | + | |
| 318 | + | |
| 319 | + | |
| 320 | + | |
| 321 | + | |
| 322 | + | |
| 323 | + | |
| 324 | + | |
| 325 | + | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
| 359 | + | |
| 360 | + | |
| 361 | + | |
| 362 | + | |
| 363 | + | |
| 364 | + | |
| 365 | + | |
| 366 | + | |
| 367 | + | |
| 368 | + | |
| 369 | + | |
| 370 | + | |
303 | 371 | | |
304 | 372 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
7 | 7 | | |
8 | 8 | | |
9 | 9 | | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
10 | 36 | | |
11 | 37 | | |
12 | 38 | | |
| |||
29 | 55 | | |
30 | 56 | | |
31 | 57 | | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
32 | 65 | | |
33 | 66 | | |
34 | 67 | | |
| |||
0 commit comments