Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes lists with mixed indentation #485

Merged
merged 3 commits into from
Apr 19, 2020

Conversation

scinos
Copy link
Contributor

@scinos scinos commented Apr 7, 2020

Fixes #198

This PR introduces a new logic to determine the indentation stops when using tabs.

Before, a tab will only create an indentation stop for the whole tab. For example, a line starting with a tab character would have produced the stops {4: 0}, meaning that the 4th indentation point (because tabSize=4) starts at index 0 (i.e. the first character). This was used to un-indent lists, by getting all lines that belong to the same listItem and looking for the biggest common indentation point.

This is a problem when mixing spaces and tabs. Imagine a line indented with 2 spaces (it will have stops {1:0, 2: 1}, with 1st and 2nd indentation stops at characters 0 and 1, respectively. If the following line has an indentation of {4: 0} (using a tab), they won't have any common indentation stop.

After this change, we generate all intermediate indentation stops when using tabs: A line starting with a tab will have stops {1: 0, 2:0, 3:0, 4:0}. This means that it the line above (or below) has an indentation of 1, 2, 3 or 4 spaces, both lines will be considered indented with the same indentation.

The result is that, when "unindenting" a line starting with a tab, it can be "unindented" the equivalent to 1, 2, 3 or 4 spaces, and the tab will be consumed in the process.

@@ -25,7 +26,10 @@ function indentation(value) {
indent = Math.floor(indent / size) * size
}

stops[indent] = index
while (lastIndent < indent) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For spaces (i.e. with indent - lastIndent == 1), this is equivalent to just stops[indent]=index; lastIndent=indent

For tabs, this will create entries in stops for all intermediate indents since the last one. For example, if the last indentation point is 1 (a space), and the next indentation point is 5 (a tab follows the initial space), this will add stops 2, 3, 4 and 5 pointing to character 1 (the tab).


values[position] =
padding + values[position].slice(index in stops ? stops[index] + 1 : 0)
values[position] = values[position].slice(stops[index] + 1)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not needed anymore because stops will now contain all indentation points, therefore index will never be different than minIndent.

@@ -53,7 +53,7 @@
"children": [
{
"type": "text",
"value": "Very long\n\t\t\tparagraph",
"value": "Very long\n\tparagraph",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comes from trying to find the indentation of

- Very long
<tab><tab>paragraph

This text is the "internal" content, the content without the indentation.

The resulting should be

Very long
<tab>paragraph

as the - in the first line marks the indentation, that is matched by the first <tab> in the second line.

Making this changes in the fixture makes it more consistent with the rest of the rest of the mixed-indentation fixtures, in all flavours and list styles.

@scinos
Copy link
Contributor Author

scinos commented Apr 7, 2020

@transitive-bullshit I'd appreciate your feedback in this PR

@thisconnect

This comment has been minimized.

@ChristianMurphy ChristianMurphy requested review from a team April 7, 2020 18:11
@transitive-bullshit
Copy link

This looks really solid @scinos. Thank you so much for taking the time to work on this. I'm extremely booked with saasify stuff atm so I won't be able to give this the attention it deserves for awhile.

Glancing through, your approach and comments look reasonable and nothing jumps out at me as red flags. If all of the unit tests pass without regressions, that's the most important thing.

The only other note I have is that these changes should probably be accompanied by some additional unit tests that focus on this use case if there's anything not covered by existing tests.

This would fix sindresorhus/awesome-lint#44 which has funding associated with it @sindresorhus

@wooorm any additional thoughts?

@wooorm
Copy link
Member

wooorm commented Apr 9, 2020

@scinos The changes look good!

As only a couple of lines changed in the source code, and only a couple in the tests too, I do think adding new tests would be a great addition. For example, original markdown from Automattic/wp-calypso#40798 may be useful, and then carefully comparing them with how CM and markdown-it parse!

To add a test, add input markdown like $name.text in input/, make sure that $name.json is in tree/ (you can also put $name.commonmark.json there to test CM too), and finally you can run node script/regenerate-fixtures to populate those fixtures! Then it’s often easier to compare your changes with git diff.

@scinos
Copy link
Contributor Author

scinos commented Apr 13, 2020

Added a few tests for lists with mixed tabs and spaces.

I also run the tests in https://markdown-it.github.io/, https://spec.commonmark.org/dingus/ and printing the tree with unist-util-inspect (both with commonmark:false and comomnmark:true). The results are quite similar in all cases:

image

image

image

Before the fix, about half of the cases in the new test failed to parse properly. Example:

image

@scinos
Copy link
Contributor Author

scinos commented Apr 16, 2020

Any update on this? Is there anything else I should change in this PR?

@wooorm wooorm merged commit 0697d46 into remarkjs:master Apr 19, 2020
@wooorm wooorm added remark-parse 🐛 type/bug This is a problem 👶 semver/patch This is a backwards-compatible fix 🗄 area/interface This affects the public interface labels Apr 19, 2020
@wooorm
Copy link
Member

wooorm commented Apr 19, 2020

Released as 8.0.1!

@transitive-bullshit
Copy link

Amazing work @scinos!

@sindresorhus
Copy link
Contributor

@scinos Just saw this. Thanks! You got the IssueHunt reward: https://issuehunt.io/u/scinos

@wooorm wooorm added the 💪 phase/solved Post is done label Aug 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🗄 area/interface This affects the public interface 💪 phase/solved Post is done remark-parse 👶 semver/patch This is a backwards-compatible fix 🐛 type/bug This is a problem
Development

Successfully merging this pull request may close these issues.

Sublists aren't parsed when indented with tabs
5 participants