-
-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
URLs with parenthesis break parser #74
Comments
Maybe just documenting that brackets need to be URL-escaped, i.e. |
I don't think we can reasonably expect this. My users would trip on this frequently. Thanks Wikipedia. I don't like having to parenthesis count, but I can't think of a better solution. |
I agree, it would be better to change "contain the link destination (URL) in parentheses" to "contain the link destination (URL) in balanced parentheses" (or something to that effect) in the syntax description and alter the parser to handle this. (Also add a test.) |
Odd. This works with the current djot.js parser:
|
But:
|
So, bottom line is that we've been trying to support balanced parens in URLs, but the potentially matching underscores are confusing the parser. |
What's happening here is that we're using the delimiter stack to match parens, and when the two |
I hope I'm not dragging in a dead horse to beat it, but it really bothers me that emphasis markers are somehow dragged into URLs. I wish djot had three clear kinds of elements (blocks, inline text, raw text) and URLs were in the latter where no inline parsing or matching happens. Admittedly this worsens the current issue if URLs are raw to the point of not parsing parentheses, though that can be lifted at the syntax level (e.g. with |
Well, the issue is that in commonmark.js we try to do a one-pass parse without backtracking. So, when we hit If we didn't care about backtracking, we could do things differently. In fact, I do do things differently in my Haskell djot parser: I just try to parse the link destination, and if it fails I backtrack. Anyway, there are a lot of different parsing strategies, but as far as I can see this is not an issue with djot's syntax. |
I think the best fix here would be not to abuse the delimiter stack to keep track of matching parens (since this really only has a purpose in links), but to create a new data structure for this. |
From where I sit, I see an issue with djot's syntax that manifests in this issue and a few others (which I discussed in detail in jdm/djot#247), and that I (maybe wrongly) linked to inline parsing leaking into raw text parsing. In other words, we can have link parsing without backtracking, because we can know that Anyway, I've already made my case in the discussion linked above, I promise I will try harder to not bother you with it again. |
Wikipedia links often have
(
in them, which breaks djot. Consider this example:Three problems here:
The text was updated successfully, but these errors were encountered: