-
-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Weird trimming of inline element text causes invalid markdown #82
Comments
Okay maybe this issue should be moved to rehype-minify-whitespace as it seems that it's the one doing the |
Welcome @necauqua! 👋 |
Yeeeeah, I kind of figured it's probably like that. Although I'm not sure the My usecase is the same as the few people there, I have some meh html from WYSIWYG editor I'm upgrading to markdown on the fly. Looks like I'm keeping the workaround of "if inline node ends with whitespace and it's sibling is a text node move the space there", although that's one extra dfs pass, eh - because it's okay if some odd html generates some broken markdown, but my data has a ton of |
I believe that behavior is correct. That is how browsers collapse whitespace. Closing this as a duplicate of syntax-tree/mdast-util-to-markdown#12. I think the approach I outlined there will solve it. |
This comment has been minimized.
This comment has been minimized.
Huh, I was looking at it in the devtools but I didn't notice that's actually the case, I see, thanks Technically this is the hast-util-to-mdast issue then, since when it sees such minified html it breaks - but as you pointed out it seems like there's no universal fix. |
Why? Nothing breaks there.
There is a universal fix: the one I pointed to. To generate markdown that has those spaces your messy HTML has too, the ones a browser sees too If you want to clean dirty HTML, that’s fine, but that’s something else, and not done in this project or in |
Maybe my wording was wrong (yeah it was all over the place) but it generates broken markdown?. By no universal fix I meant without using a bunch of entities, since as I saw you mentioning on other issues that you strive to generate readable markdown with no html remains, which means that that universal fix is out of scope Yeah in the end I concluded that my html is dirty as it couldn't be just cleanly converted, and I work around that by doing some cleaning of my html first 😅 |
Yes. I mean character references (entities). Those are part of markdown: “Readable” would indeed be nice, but as folks can use character references in their input markdown, and it’s valid and fine, we’d have to use character references in the output too. |
Wait it just occured to me that syntax-tree/mdast-util-to-markdown#12 (and that it is a to-markdown issue) is not closed and you said it will solve this issue and you're not opposed to having those entities (I mean in the case the input mdast was dirty like that), all of my comments are moot Thanks, sorry for dragging this on 😅 And huh, once the to-markdown issue is solved like that I can avoid my janky cleaning which does not cover 100% corner cases |
Yes! No worries ;) |
Initial checklist
Affected packages and versions
hast-util-to-mdast==10.1.0
Link to runnable example
https://codesandbox.io/p/devbox/dry-http-xx97sg
Steps to reproduce
Well, minimal example is converting HTML like this:
<p>some text with <em> spaced emphasis </em> in between</p>
with a direct html->hast->mdast->markdown pipeline generates this (invalid) markdown:
some text with *spaced emphasis *in between
Expected behavior
Given that the browser seems to trim the contents of
<em>
in such case, this?.some text with *spaced emphasis* in between
Actual behavior
Well the space is in the wrong node somehow after
hast-util-to-mdast
call.Not sure if other nodes are affectedyup, at least strong and del are too, assuming it's inline nodesAffected runtime and version
all?.
Affected package manager and version
No response
Affected OS and version
No response
Build and bundle tools
No response
The text was updated successfully, but these errors were encountered: