fix: replace Markdown image links and raw <img> tags with alt text#187
fix: replace Markdown image links and raw <img> tags with alt text#187dsanders11 wants to merge 2 commits intomainfrom
Conversation
| // Replace <br> elements with a newline | ||
| if (tokenToCheck.content.match(/<br\s*\/?>/)) { | ||
| joinedContent += '\n'; | ||
| } |
There was a problem hiding this comment.
Intentional fallthrough as <img> can be both html_inline and html_block. Will add a comment about the intentional fallthrough.
| } | ||
| case 'html_block': | ||
| // Replace <img> and <image> tags with [Image: <alt text>] | ||
| const imgMatch = tokenToCheck.content.match(/<(?:img|image)\b[^>]*>/i); |
There was a problem hiding this comment.
Is <image really valid syntax? surely not
Regardless can we parse this as XML or something? HTML typically can't be safely parsed with regex (with the exception of inline self-closed tags like <hr />
There was a problem hiding this comment.
Is
<imagereally valid syntax? surely not
Claude special, not sure why it decided <image> was syntax it needed to handle. 😅 Will clean up.
Regardless can we parse this as XML or something? HTML typically can't be safely parsed with regex (with the exception of inline self-closed tags like
<hr />
We can make this more robust, but I think "Good enough" might suffice here? If the regex approaches misses some cases they'll just be excluded from the output all together, which seems like an acceptable failure mode. Is there a main issue you want to avoid with more robust parsing?
There was a problem hiding this comment.
For context, we currently have three usages of <img> in our docs that are outside of code blocks.
If we have any larger concerns around <img> parsing, we could also just convert those to the ![]() Markdown syntax.
There was a problem hiding this comment.
If we have any larger concerns around
parsing, we could also just convert those to the
Markdown syntax.
Looks like we could convert the one in docs/tutorial/electron-timelines.md easily, but the ones in docs/api/web-contents.md have stuck around (we could otherwise just disallow <img> via markdowlint) because they scratch an itch which Markdown syntax can't, namely setting a width. Their display width is cut in half for better display on high DPI screens. So we may want to continue to allow for that niche usage of <img> in the docs. Although, ironically enough those images links are currently broken on the website. 😅
Fixes #186.
Alt text seems like the easiest solution to this issue for the moment, and it fixes a real example from our existing docs which I've included as a test case.