-
Notifications
You must be signed in to change notification settings - Fork 889
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues with list items #357
Comments
True, but browsers will still render list items outside of lists. If the goal is for the resulting markdown to most-closely render as the original html renders, then it should be moved to a new line just like the browser does. |
I can reproduce the first issue: <ol>
<li>
<h1>foo</h1>
</li>
</ol> is converted to:
Which a markdown parser (I'm using this one) turns into: <ol>
<li>foo
===</li>
</ol> So the heading is lost. I don't think turning a heading into bold text is a good idea (#358). It may retain some semblance of a heading visually, but semantically and structurally, it does not adequately replace the original HTML. In addition, the same issue seems to extend to <ol>
<li>
<blockquote>foo</blockquote>
</li>
</ol> ...is converted into...
Passed back into a markdown parser, that is turned into: <ol>
<li>> foo</li>
</ol> So it treats the part "> foo" as regular text and keeps the ">" sign. The good news is that there's no need to turn headings into bold text because there's a more general fix. For the first code block in this comment, turndown should instead return this:
That is, with one space after the period and four spaces before the "f" in "foo". Then the markdown parser recognizes the heading properly: <ol>
<li><h1>foo</h1></li>
</ol> The same fix works for blockquotes as well:
is turned into: <ol>
<li><blockquote>
<p>foo</p>
</blockquote></li>
</ol> As a temporary workaround, all list items could have this spacing, but that looks odd and is unnecessary when a list item starts with inline content. |
Here's a workaround for the consumer side to fix the first issue for now. I created it with the help of AI and only lightly tested it. I'm a Turndown noob but maybe it helps someone: let td = new TurndownService();
td.addRule('listItemWithBlockAsFirstChild', {
filter: node => {
return (
node.nodeName === 'LI' &&
node.firstChild &&
// Block elements listed on https://www.w3schools.com/htmL/html_blocks.asp
[
'address', 'article', 'aside', 'blockquote', 'canvas', 'dd', 'div',
'dl', 'dt', 'fieldset', 'figcaption', 'figure', 'footer', 'form',
'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'header', 'hr', 'li', 'main',
'nav', 'noscript', 'ol', 'p', 'pre', 'section', 'table', 'tfoot',
'ul', 'video'
]
.map(t => t.toUpperCase())
.includes(node.firstChild.nodeName)
);
},
replacement: (content, node, options) => {
let indicator;
if (node.parentNode.nodeName === 'OL') {
let count = 1; // Start counting from 1
let sibling = node.previousElementSibling;
while (sibling) {
if (sibling.nodeName === 'LI') {
count++;
}
sibling = sibling.previousElementSibling;
}
indicator = `${count}.`;
} else {
indicator = options.bulletListMarker;
}
let innerMarkdown = td.turndown(node.innerHTML);
// Add one space behind the indicator and four spaces underneath
return `${indicator} \n\n ${innerMarkdown}\n`;
}
}); In short, the filter applies to any We then determine the indicator (either Lastly, we parse the (I wrote this explanation, not an AI.) |
Fixes the first issue described in mixmark-io#357. The fix the described in more detail here: https://github.com/mixmark-io/turndown/issues/357\#issuecomment-1694699312
Fixes the first issue mentioned in mixmark-io#357. I describe the bug in more detail here: mixmark-io#357 (comment)
Disregard previous workaround in favor of this fix as per my PR #447: rules.listItem = {
filter: 'li',
replacement: function (content, node, options) {
content = content
.replace(/^\n+/, '') // remove leading newlines
.replace(/\n+$/, '\n') // replace trailing newlines with just a single one
.replace(/\n/gm, '\n ') // indent
var prefix = options.bulletListMarker + ' '
var parent = node.parentNode
// Check if the first child is a block-level element
var blockElements = [
'address', 'article', 'aside', 'blockquote', 'canvas', 'dd', 'div',
'dl', 'dt', 'fieldset', 'figcaption', 'figure', 'footer', 'form',
'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'header', 'hr', 'li', 'main',
'nav', 'noscript', 'ol', 'p', 'pre', 'section', 'table', 'tfoot',
'ul', 'video'
].map(function (t) {
return t.toUpperCase()
})
var firstChildIsBlock = node.firstChild && blockElements.includes(node.firstChild.nodeName)
if (parent.nodeName === 'OL') {
var start = parent.getAttribute('start')
var index = Array.prototype.indexOf.call(parent.children, node)
prefix = (start ? Number(start) + index : index + 1) + '. '
}
return (
prefix + (firstChildIsBlock ? '\n\n ' : '') + content + (node.nextSibling && !/\n$/.test(content) ? '\n' : '')
)
}
} |
Issue 1: Headers inside list items.
Consider:
This renders in html as:
In turndown, this becomes:
* ####Header inside list item
Which renders as:
I propose that headers inside list items be treated as strong:
Which renders as:
Issue 2: Standalone headers following other non-list-item nodes.
Consider:
This renders in html as:
In turndown, this becomes:
Which renders as:
* Standalone list item
I propose that list items that follow non-list-items should be placed on a new line:
Which renders as:
The text was updated successfully, but these errors were encountered: