Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upgrade markdown package reference #2364

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

jezell
Copy link

@jezell jezell commented Oct 4, 2024

This upgrades the markdown package to 7.2.1, also fixes an issue where encodeHtml is not passed to the markdown parser.

@@ -38,7 +38,8 @@ class SuperEditorImageSyntax extends md.LinkSyntax {
if (parser.pos + 1 >= parser.source.length) {
// The `]` is at the end of the document, but this may still be a valid
// shortcut reference link.
return _tryCreateReferenceLink(parser, text, getChildren: getChildren);
final link = _tryCreateReferenceLink(parser, text, getChildren: getChildren);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like all uses of _tryCreateReferenceLink() gets converted to a list. Should we just change that method to return Iterable<md.Node>? and then continue to return it directly?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. That would be better.

@@ -23,9 +23,12 @@ MutableDocument deserializeMarkdownToDocument(
List<ElementToNodeConverter> customElementToNodeConverters = const [],
bool encodeHtml = false,
}) {
final markdownLines = const LineSplitter().convert(markdown);
final markdownLines = const LineSplitter().convert(markdown).map((l) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happened to require this difference? I'm not sure what the existing code did, but it's not immediately obvious what the difference is in impact between the original code and this new code.

Copy link
Author

@jezell jezell Oct 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Markdown lib changed from taking List<String> to taking List<Line>.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add an explicit type parameter to your map function so that the difference is unambiguous? Part of the confusion here is that reading the code doesn't tell us what type "l" is.

}
static final _tags = [ md.DelimiterTag("u", 1) ];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please declare all static members above all instance members (we probably need to add a lint for that at some point).

And can you also add a comment above _tags that describe what it is, and how it's used?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is required by md.DelimiterSyntax, in theory the constructor takes an optional tags member, but a bug in
the Markdown lib replaces that with a const [], and then tries to sort it. Will add a comment to that effect.

@@ -548,7 +555,7 @@ class _ParagraphWithAlignmentSyntax extends _EmptyLinePreservingParagraphSyntax

@override
bool canParse(md.BlockParser parser) {
if (!_alignmentNotationPattern.hasMatch(parser.current)) {
if (!_alignmentNotationPattern.hasMatch(parser.current.content)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this a pre-existing error on our part? Should we have been accessing parser.current.content this whole time?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Markdown lib changed from String to Line which has a content member, so this just redirects to where the string is now.

@matthew-carroll
Copy link
Contributor

FYI @angelosilvestre

@matthew-carroll
Copy link
Contributor

@jezell it looks like we've got a bunch of failing tests. I'm guessing most/all of them is because we need corresponding changes in other packages.

…ences to match new markdown lib Iterable<md.Node>, add missing override for createNode
@jezell
Copy link
Author

jezell commented Oct 5, 2024

Looking at the tests, I'm not sure about one of them, "unordered list followed by empty list item" seems to not actually convert in commonmark, so not sure if it's a bad test. Here too in this comment I've put the markdown and it doesn't end up as a single list item and then nothing else. Maybe the new markdown parser is more correct here, but I get three different results with this one from three different parsers, and also different results from github which makes it a hr

https://markdowntohtml.com/
https://daringfireball.net/projects/markdown/dingus
https://spec.commonmark.org/dingus/

  • list item 1

@jezell
Copy link
Author

jezell commented Oct 5, 2024

The strikethrough test also looks bad, strikethrough is a double ~ not one, but the test has a single. Github markdown seems to support single, but that doesn't appear to be standard or documented (https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax):

https://www.markdownguide.org/extended-syntax/#:~:text=You%20can%20strikethrough%20words%20by,before%20and%20after%20the%20words.

Seems like maybe the dart markdown package got more strict. Definitely a change in behavior, but most guides seem to confirm it's supposed to be double not single tilde.

@@ -1433,10 +1437,22 @@ with multiple lines
expect(document.getNodeAt(25)!, isA<ParagraphNode>());
});

test('paragraph with strikethrough', () {
test('paragraph with single ~ should not be strikethrough', () {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a change in behavior, but it seems to be more correct. The underlying dart markdown parser seems to be more strict now, but it does seem to align better with the generally accepted pattern for strikethrough.

Should the markdown parser accept the decision of the underlying markdown lib to change the behavior here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the new name of this test is ambiguous. When I read it, I thought it meant a single standalone "" without a closing "". But based on the implementation, I take it that you're trying to describe the difference between "~" and "~~"?

If you're saying that the new package version doesn't parse "something" as strikethrough, then I think we need to adjust it to do so. Given we've had that behavior for a long time, I don't think we should break it for existing users. That said, users should have an opportunity to turn it off, if they choose.

Adding support for double "~~" should be fine because its a new feature, and if that really is the norm, then I don't foresee complaints.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matthew-carroll added single line strikethrough parsing node, a bit hacky to get it to work since underlying markdown parser doesn't like having two sort of similar nodes, but does work as long as the order in the list is correct. Added comments to explain.

test('unordered list followd by empty list item', () {
const markdown = """- list item 1
- """;
test('unordered list followed by empty list item', () {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a change in behavior here in this specific case introduced by the underlying parser changing, but was the intent of the test really to verify that the hanging - was deleted from the line in a weird case that markdown parsers don't seem to agree on? CommonMark spec seems to say that empty list items should be allowed, not stripped. This updates the test to verify that empty list items should be kept like the spec says, rather than test for something that seems to be up interpreted differently depending on the parser being used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is where we introduced this test: #1822

expect(styledText.getAllAttributionsAt(7).contains(strikethroughAttribution), false);
});

test('paragraph with double strikethrough', () {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a test for this case since it was missing

@@ -1433,10 +1437,22 @@ with multiple lines
expect(document.getNodeAt(25)!, isA<ParagraphNode>());
});

test('paragraph with strikethrough', () {
test('paragraph with single ~ should not be strikethrough', () {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the new name of this test is ambiguous. When I read it, I thought it meant a single standalone "" without a closing "". But based on the implementation, I take it that you're trying to describe the difference between "~" and "~~"?

If you're saying that the new package version doesn't parse "something" as strikethrough, then I think we need to adjust it to do so. Given we've had that behavior for a long time, I don't think we should break it for existing users. That said, users should have an opportunity to turn it off, if they choose.

Adding support for double "~~" should be fine because its a new feature, and if that really is the norm, then I don't foresee complaints.

/// This [DelimiterSyntax] produces `Element`s with a `u` tag.
class UnderlineSyntax extends md.DelimiterSyntax {

/// This is required by md.DelimiterSyntax, in theory the constructor takes an optional tags member, but a bug in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment still doesn't tell us what this thing is. What does "u" mean? What's a DelimiterTag? Why is it required? What happens if it's not passed to the super constructor? You somewhat answered that last part, but didn't mention the consequences.

Assume the next person who's responsible for this has very little understanding of the inside of the markdown package.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was trying to explain that it's not actually supposed to be required from the class definition:

https://pub.dev/documentation/markdown/latest/markdown/DelimiterSyntax-class.html

The constructor takes a nullable. However, the problem is there is a bug in the underlying dart library if you don't pass it. Due to these two lines, one sets it to const [] if not passed, then the next tries to sort. So we have to pass something or it blows up. That's the only reason it's there.

https://github.com/dart-lang/markdown/blob/d53feae0760a4f0aae5ffdfb12d8e6acccf14b40/lib/src/inline_syntaxes/delimiter_syntax.dart#L67

https://github.com/dart-lang/markdown/blob/d53feae0760a4f0aae5ffdfb12d8e6acccf14b40/lib/src/inline_syntaxes/delimiter_syntax.dart#L319

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that point about the bug in the package - but those questions I raised above are still real questions that a reader will have. Can you please address those in the code comment?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this explanation to the code, hope it clarifies the reasoning this is required better.

@@ -23,9 +23,12 @@ MutableDocument deserializeMarkdownToDocument(
List<ElementToNodeConverter> customElementToNodeConverters = const [],
bool encodeHtml = false,
}) {
final markdownLines = const LineSplitter().convert(markdown);
final markdownLines = const LineSplitter().convert(markdown).map((l) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add an explicit type parameter to your map function so that the difference is unambiguous? Part of the confusion here is that reading the code doesn't tell us what type "l" is.

@matthew-carroll
Copy link
Contributor

Looks like we have 40 failing tests in super_editor - most/all of them look like auto-conversion of text, which is based on Markdown parsing.

@jezell
Copy link
Author

jezell commented Oct 7, 2024

Looks like we have 40 failing tests in super_editor - most/all of them look like auto-conversion of text, which is based on Markdown parsing.

@matthew-carroll looks like what was happening is the new dart markdown parser drops of extra newlines if there are hanging newlines at the end of the file after a list. There aren't any tests for this in the markdown package, but the tests in the super_editor package relies on it for setting the caret in an empty node after the lists. I'm adding a test for this case and some extra logic in the parser. It's a little dirty, but seems like the desire here is to insulate people from the markdown lib changing. At least the test will catch if they change the behavior back.

const markdown = """- list item 1
- """;
test('unordered list followed by empty list item', () {
const markdown = """
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this leading newline need to be here? If a Markdown blob begins with a list item, it won't have a leading newline.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't need to be there, will remove the leading.

final doc = deserializeMarkdownToDocument('~This is~ a paragraph.');
final styledText = (doc.getNodeAt(0)! as ParagraphNode).text;

// Ensure text within the range isn't attributed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this test be reverted, given that you added the single-character parser?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Test was reverted but comment wasn't.

@@ -1541,6 +1557,23 @@ Paragraph3""";
expect((doc.getNodeAt(2)! as ParagraphNode).text.text, 'Paragraph3');
});

test('empty paragraph after paragraphs', () {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you phrase the test name in terms of what you're verifying? I'm actually not sure at first glance which detail you're locking down here. Is it that 2 trailing newlines are reduced to 1? Is it that 2+ trailing newlines are reduced to 1?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updating to 'every 2 newlines after a list are a paragraph'. The logic is a bit weird but it seems to be what the old dart markdown parser was doing with new lines after lists. The super_editor tests that deal with inserting newlines after lists fail with null reference exceptions due to missing nodes if this doesn't happen. I first tried adding code to preserve all hanging newlines regardless of what came between, but other tests in the markdown package want a no newlines after other types of content. Seems like the old parser wasn't really consistent with hanging newlines and that leaked into the tests. So this retains old behavior, but also codifies old inconsistencies.

@matthew-carroll
Copy link
Contributor

@angelosilvestre can you also take a look at this PR and see if you find anything that I've missed?

Copy link
Collaborator

@angelosilvestre angelosilvestre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a single comment.

// Add 1 hanging line for every 2 blank lines at the end, need this to preserve behavior pre markdown 7.2.1
final hangingEmptyLines = markdownLines.reversed.takeWhile((md.Line l) => l.isBlankLine);
if(hangingEmptyLines.isNotEmpty && documentNodes.lastOrNull is ListItemNode) {
for(var i = 0; i < hangingEmptyLines.length >> 1; i++) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this be a bit more readable if we use hangingEmptyLines.length ~/ 2 instead of hangingEmptyLines.length >> 1?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't catch that for loop, but yeah, @jezell let's not use bit shifting unless we absolutely have to. If you wanna divide by two, let's use the standard operators for that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jezell can you push an update with this change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants