Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 55 additions & 0 deletions packages/api/src/files/text.spec.ts
Original file line number Diff line number Diff line change
Expand Up @@ -300,5 +300,60 @@ describe('text', () => {
source: FileSources.text,
});
});

it.each([
{ mimetype: 'text/markdown', originalname: 'notes.md' },
{ mimetype: 'text/x-markdown', originalname: 'notes.md' },
{ mimetype: 'application/markdown', originalname: 'notes.md' },
{ mimetype: 'application/x-markdown', originalname: 'notes.md' },
{ mimetype: 'application/octet-stream', originalname: 'README.md' },
{ mimetype: 'application/octet-stream', originalname: 'GUIDE.MARKDOWN' },
])(
Comment on lines +304 to +321
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The markdown short-circuit tests cover several markdown MIME types and extension-only detection, but they don't cover the text/md MIME type which is treated elsewhere in the codebase as valid Markdown. Adding a text/md case (ideally with a non-.md filename to ensure the MIME-type path is what triggers) would prevent regressions where markdown still gets routed through the RAG /text endpoint.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a text/md test case (with an extension-less filename so the MIME path is what triggers) and three more cases covering the normalization edges: text/markdown; charset=utf-8, uppercase TEXT/MARKDOWN, and whitespace-padded input. See 99f062d2a.

'should short-circuit to native parsing for markdown file (%o)',
async ({ mimetype, originalname }) => {
process.env.RAG_API_URL = 'http://rag-api.test';
const mockText = '# Heading\n\n**bold** text';
const mockBytes = Buffer.byteLength(mockText, 'utf8');

mockedReadFileAsString.mockResolvedValue({
content: mockText,
bytes: mockBytes,
});

const result = await parseText({
req: mockReq,
file: { ...mockFile, mimetype, originalname },
file_id: mockFileId,
});

expect(mockedAxios.get).not.toHaveBeenCalled();
expect(mockedAxios.post).not.toHaveBeenCalled();
expect(result).toEqual({
text: mockText,
bytes: mockBytes,
source: FileSources.text,
});
},
);

it('should still call the RAG API for non-markdown text files', async () => {
process.env.RAG_API_URL = 'http://rag-api.test';
const mockText = 'plain text content';

mockedAxios.get.mockResolvedValue({ status: 200, statusText: 'OK' });
mockedAxios.post.mockResolvedValue({ data: { text: mockText } });

await parseText({
req: mockReq,
file: mockFile,
file_id: mockFileId,
});

expect(mockedAxios.post).toHaveBeenCalledWith(
'http://rag-api.test/text',
expect.any(Object),
expect.objectContaining({ timeout: 300000 }),
);
});
});
});
22 changes: 22 additions & 0 deletions packages/api/src/files/text.ts
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,21 @@ import type { ServerRequest } from '~/types';
import { logAxiosError, readFileAsString } from '~/utils';
import { generateShortLivedToken } from '~/crypto/jwt';

const MARKDOWN_MIME_TYPES = new Set([
'text/markdown',
'text/x-markdown',
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isMarkdownFile() checks a fixed set of MIME types but does not include text/md. In the codebase, text/md is treated as a valid Markdown MIME type (e.g., artifacts prompt), and the Upload-as-Text path accepts arbitrary type/subtype valuesβ€”so a Markdown upload with mimetype text/md (especially if the filename lacks a .md extension) will still go through the RAG /text endpoint and have formatting stripped. Consider adding text/md to the Markdown MIME set (or normalizing MIME types using the same canonicalization used elsewhere) so Markdown is consistently short-circuited.

Suggested change
'text/x-markdown',
'text/x-markdown',
'text/md',

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch β€” text/md is treated as a valid markdown MIME elsewhere in the codebase (e.g. artifacts prompt). Added it to MARKDOWN_MIME_TYPES in 99f062d2a, plus MIME-type normalization so parameterized variants like text/markdown; charset=utf-8 also short-circuit.

'application/markdown',
'application/x-markdown',
]);

function isMarkdownFile(file: Express.Multer.File): boolean {
if (file.mimetype && MARKDOWN_MIME_TYPES.has(file.mimetype)) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Normalize markdown MIME before set lookup

isMarkdownFile does an exact lookup on file.mimetype, so valid multipart values like text/markdown; charset=utf-8 will not be recognized as markdown unless the filename also ends in .md/.markdown. In that scenario (for example, markdown uploads named without extension), parseText still calls the RAG /text path and the markdown-formatting loss this change is meant to prevent can still occur. Please normalize the MIME type (lowercase and strip parameters) before checking the markdown set.

Useful? React with πŸ‘Β / πŸ‘Ž.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks β€” pushed 99f062d2a to address this. isMarkdownFile now runs file.mimetype through a normalizeMimeType() helper that lowercases and strips parameters before the set lookup, so text/markdown; charset=utf-8, TEXT/MARKDOWN, and whitespace-padded variants all short-circuit as expected. New parametrized test cases cover each of those shapes.

return true;
}
const name = file.originalname?.toLowerCase() ?? '';
return name.endsWith('.md') || name.endsWith('.markdown');
}

/**
* Attempts to parse text using RAG API, falls back to native text parsing
* @param params - The parameters object
Expand All @@ -29,6 +44,13 @@ export async function parseText({
return parseTextNative(file);
}

if (isMarkdownFile(file)) {
logger.debug(
'[parseText] Markdown file detected, using native parsing to preserve raw formatting',
);
return parseTextNative(file);
}

const userId = req.user?.id;
if (!userId) {
logger.debug('[parseText] No user ID provided, falling back to native text parsing');
Expand Down
Loading