Skip to content
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 16 additions & 1 deletion src/markdown-helpers.ts
Original file line number Diff line number Diff line change
Expand Up @@ -707,6 +707,11 @@ export const safelyJoinTokens = (tokens: Token[], options: JoinTokenOptions = {}
joinedContent += safelyJoinTokens(tokenToCheck.children, options);
continue;
}
if (tokenToCheck.type === 'image') {
const altText = tokenToCheck.content || 'No Description';
joinedContent += `[Image: ${altText}]`;
continue;
}
expect(tokenToCheck.children).to.equal(
null,
'There should be no nested children in the joinable tokens',
Expand Down Expand Up @@ -800,13 +805,23 @@ export const safelyJoinTokens = (tokens: Token[], options: JoinTokenOptions = {}
break;
case 'paragraph_open':
case 'blockquote_close':
case 'html_block':
break;
case 'html_inline':
// Replace <br> elements with a newline
if (tokenToCheck.content.match(/<br\s*\/?>/)) {
joinedContent += '\n';
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lost a break:?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentional fallthrough as <img> can be both html_inline and html_block. Will add a comment about the intentional fallthrough.

case 'html_block':
// Replace <img> and <image> tags with [Image: <alt text>]
const imgMatch = tokenToCheck.content.match(/<(?:img|image)\b[^>]*>/i);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is <image really valid syntax? surely not

Regardless can we parse this as XML or something? HTML typically can't be safely parsed with regex (with the exception of inline self-closed tags like <hr />

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is <image really valid syntax? surely not

Claude special, not sure why it decided <image> was syntax it needed to handle. 😅 Will clean up.

Regardless can we parse this as XML or something? HTML typically can't be safely parsed with regex (with the exception of inline self-closed tags like <hr />

We can make this more robust, but I think "Good enough" might suffice here? If the regex approaches misses some cases they'll just be excluded from the output all together, which seems like an acceptable failure mode. Is there a main issue you want to avoid with more robust parsing?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For context, we currently have three usages of <img> in our docs that are outside of code blocks.

If we have any larger concerns around <img> parsing, we could also just convert those to the ![]() Markdown syntax.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have any larger concerns around parsing, we could also just convert those to the Markdown syntax.

Looks like we could convert the one in docs/tutorial/electron-timelines.md easily, but the ones in docs/api/web-contents.md have stuck around (we could otherwise just disallow <img> via markdowlint) because they scratch an itch which Markdown syntax can't, namely setting a width. Their display width is cut in half for better display on high DPI screens. So we may want to continue to allow for that niche usage of <img> in the docs. Although, ironically enough those images links are currently broken on the website. 😅

if (imgMatch) {
const altMatch = imgMatch[0].match(/alt=["']([^"']*)["']/i);
const altText = altMatch && altMatch[1] ? altMatch[1] : 'No Description';
joinedContent += `[Image: ${altText}]`;
if (tokenToCheck.type === 'html_block') {
joinedContent += '\n\n';
}
}
break;
case 'fence':
if (options.parseCodeFences) {
Expand Down
4 changes: 4 additions & 0 deletions tests/__snapshots__/markdown-helpers.spec.ts.snap
Original file line number Diff line number Diff line change
Expand Up @@ -352,6 +352,10 @@ Process: Main
_This class is not exported from the \`'electron'\` module. It is only available as a return value of other methods in the Electron API._"
`;

exports[`markdown-helpers > safelyJoinTokens > snapshots > should be correct for paragraph-with-image 1`] = `"This paragraph has an image [Image: screenshot] embedded in it."`;

exports[`markdown-helpers > safelyJoinTokens > snapshots > should be correct for paragraph-with-image-no-alt 1`] = `"This paragraph has an image with no alt text [Image: No Description] in it."`;

exports[`markdown-helpers > safelyJoinTokens > snapshots > should be correct for paragraph-with-inline-code 1`] = `"This is a \`inline code\` block that is \`code\` tagged."`;

exports[`markdown-helpers > safelyJoinTokens > snapshots > should be correct for paragraph-with-links 1`] = `"This paragraph has a bunch of stuff. Also some links Foo, these links should be stripped."`;
Expand Down
1 change: 1 addition & 0 deletions tests/fixtures/paragraph-with-image-no-alt.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
This paragraph has an image with no alt text ![](https://example.com/screenshot.png) in it.
1 change: 1 addition & 0 deletions tests/fixtures/paragraph-with-image.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
This paragraph has an image ![screenshot](https://example.com/screenshot.png) embedded in it.
52 changes: 52 additions & 0 deletions tests/markdown-helpers.spec.ts
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,58 @@ def fn():
expect(safelyJoinTokens(tokens, { parseCodeFences: true })).toMatchSnapshot();
});
});

describe('images', () => {
it('should transform markdown image links with alt text', () => {
const tokens = getTokens('Hello ![screenshot](https://example.com/screenshot.png) world');
expect(safelyJoinTokens(tokens)).toBe('Hello [Image: screenshot] world');
});

it('should use "No Description" for markdown images without alt text', () => {
const tokens = getTokens('Hello ![](https://example.com/screenshot.png) world');
expect(safelyJoinTokens(tokens)).toBe('Hello [Image: No Description] world');
});

it('should handle raw <img> tags with alt text', () => {
const tokens = getTokens(
'Hello <img src="https://example.com/img.png" alt="my image"> world',
);
expect(safelyJoinTokens(tokens)).toBe('Hello [Image: my image] world');
});

it('should handle raw <img> tags without alt text', () => {
const tokens = getTokens('Hello <img src="https://example.com/img.png"> world');
expect(safelyJoinTokens(tokens)).toBe('Hello [Image: No Description] world');
});

it('should handle raw <image> tags with alt text', () => {
const tokens = getTokens(
'Hello <image src="https://example.com/img.png" alt="my image"> world',
);
expect(safelyJoinTokens(tokens)).toBe('Hello [Image: my image] world');
});

it('should handle raw <image> tags without alt text', () => {
const tokens = getTokens('Hello <image src="https://example.com/img.png"> world');
expect(safelyJoinTokens(tokens)).toBe('Hello [Image: No Description] world');
});

it('should handle standalone <img> block tags with alt text', () => {
const tokens = getTokens(
'Before:\n\n<img width="487" alt="Image Before Adjustment" src="../images/before.png"/>\n\nAfter:\n\n<img width="487" alt="Image After Adjustment" src="../images/after.png"/>',
);
expect(safelyJoinTokens(tokens)).toBe(
'Before:\n\n[Image: Image Before Adjustment]\n\nAfter:\n\n[Image: Image After Adjustment]',
);
});

it('should handle standalone <img> block tags without alt text', () => {
const tokens = getTokens(
'Text\n\n<img width="487" src="../images/before.png"/>\n\nMore text',
);
expect(safelyJoinTokens(tokens)).toBe('Text\n\n[Image: No Description]\n\nMore text');
});
});
});

describe('extractStringEnum()', () => {
Expand Down