-
-
Notifications
You must be signed in to change notification settings - Fork 7.5k
π fix: Preserve Raw Markdown Formatting on Upload as Text #12734
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
275b20d
99f062d
4d47e80
3e3917c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||
|---|---|---|---|---|---|---|---|---|
|
|
@@ -7,6 +7,21 @@ import type { ServerRequest } from '~/types'; | |||||||
| import { logAxiosError, readFileAsString } from '~/utils'; | ||||||||
| import { generateShortLivedToken } from '~/crypto/jwt'; | ||||||||
|
|
||||||||
| const MARKDOWN_MIME_TYPES = new Set([ | ||||||||
| 'text/markdown', | ||||||||
| 'text/x-markdown', | ||||||||
|
||||||||
| 'text/x-markdown', | |
| 'text/x-markdown', | |
| 'text/md', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch β text/md is treated as a valid markdown MIME elsewhere in the codebase (e.g. artifacts prompt). Added it to MARKDOWN_MIME_TYPES in 99f062d2a, plus MIME-type normalization so parameterized variants like text/markdown; charset=utf-8 also short-circuit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normalize markdown MIME before set lookup
isMarkdownFile does an exact lookup on file.mimetype, so valid multipart values like text/markdown; charset=utf-8 will not be recognized as markdown unless the filename also ends in .md/.markdown. In that scenario (for example, markdown uploads named without extension), parseText still calls the RAG /text path and the markdown-formatting loss this change is meant to prevent can still occur. Please normalize the MIME type (lowercase and strip parameters) before checking the markdown set.
Useful? React with πΒ / π.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks β pushed 99f062d2a to address this. isMarkdownFile now runs file.mimetype through a normalizeMimeType() helper that lowercases and strips parameters before the set lookup, so text/markdown; charset=utf-8, TEXT/MARKDOWN, and whitespace-padded variants all short-circuit as expected. New parametrized test cases cover each of those shapes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The markdown short-circuit tests cover several markdown MIME types and extension-only detection, but they don't cover the
text/mdMIME type which is treated elsewhere in the codebase as valid Markdown. Adding atext/mdcase (ideally with a non-.mdfilename to ensure the MIME-type path is what triggers) would prevent regressions where markdown still gets routed through the RAG/textendpoint.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a
text/mdtest case (with an extension-less filename so the MIME path is what triggers) and three more cases covering the normalization edges:text/markdown; charset=utf-8, uppercaseTEXT/MARKDOWN, and whitespace-padded input. See 99f062d2a.