Fixes lists with mixed indentation #485

scinos · 2020-04-07T06:17:34Z

Fixes #198

This PR introduces a new logic to determine the indentation stops when using tabs.

Before, a tab will only create an indentation stop for the whole tab. For example, a line starting with a tab character would have produced the stops {4: 0}, meaning that the 4th indentation point (because tabSize=4) starts at index 0 (i.e. the first character). This was used to un-indent lists, by getting all lines that belong to the same listItem and looking for the biggest common indentation point.

This is a problem when mixing spaces and tabs. Imagine a line indented with 2 spaces (it will have stops {1:0, 2: 1}, with 1st and 2nd indentation stops at characters 0 and 1, respectively. If the following line has an indentation of {4: 0} (using a tab), they won't have any common indentation stop.

After this change, we generate all intermediate indentation stops when using tabs: A line starting with a tab will have stops {1: 0, 2:0, 3:0, 4:0}. This means that it the line above (or below) has an indentation of 1, 2, 3 or 4 spaces, both lines will be considered indented with the same indentation.

The result is that, when "unindenting" a line starting with a tab, it can be "unindented" the equivalent to 1, 2, 3 or 4 spaces, and the tab will be consumed in the process.

Fixes remarkjs#198

scinos · 2020-04-07T06:20:03Z

packages/remark-parse/lib/util/get-indentation.js

@@ -25,7 +26,10 @@ function indentation(value) {
      indent = Math.floor(indent / size) * size
    }

-    stops[indent] = index
+    while (lastIndent < indent) {


For spaces (i.e. with indent - lastIndent == 1), this is equivalent to just stops[indent]=index; lastIndent=indent

For tabs, this will create entries in stops for all intermediate indents since the last one. For example, if the last indentation point is 1 (a space), and the next indentation point is 5 (a tab follows the initial space), this will add stops 2, 3, 4 and 5 pointing to character 1 (the tab).

scinos · 2020-04-07T06:21:13Z

packages/remark-parse/lib/util/remove-indentation.js

-
-      values[position] =
-        padding + values[position].slice(index in stops ? stops[index] + 1 : 0)
+      values[position] = values[position].slice(stops[index] + 1)


This is not needed anymore because stops will now contain all indentation points, therefore index will never be different than minIndent.

scinos · 2020-04-07T06:25:08Z

test/fixtures/tree/mixed-indentation.commonmark.json

@@ -53,7 +53,7 @@
              "children": [
                {
                  "type": "text",
-                  "value": "Very long\n\t\t\tparagraph",
+                  "value": "Very long\n\tparagraph",


This comes from trying to find the indentation of

- Very long <tab><tab>paragraph

This text is the "internal" content, the content without the indentation.

The resulting should be

Very long <tab>paragraph

as the - in the first line marks the indentation, that is matched by the first <tab> in the second line.

Making this changes in the fixture makes it more consistent with the rest of the rest of the mixed-indentation fixtures, in all flavours and list styles.

scinos · 2020-04-07T06:27:23Z

@transitive-bullshit I'd appreciate your feedback in this PR

transitive-bullshit · 2020-04-08T20:35:38Z

This looks really solid @scinos. Thank you so much for taking the time to work on this. I'm extremely booked with saasify stuff atm so I won't be able to give this the attention it deserves for awhile.

Glancing through, your approach and comments look reasonable and nothing jumps out at me as red flags. If all of the unit tests pass without regressions, that's the most important thing.

The only other note I have is that these changes should probably be accompanied by some additional unit tests that focus on this use case if there's anything not covered by existing tests.

This would fix sindresorhus/awesome-lint#44 which has funding associated with it @sindresorhus

@wooorm any additional thoughts?

wooorm · 2020-04-09T15:04:42Z

@scinos The changes look good!

As only a couple of lines changed in the source code, and only a couple in the tests too, I do think adding new tests would be a great addition. For example, original markdown from Automattic/wp-calypso#40798 may be useful, and then carefully comparing them with how CM and markdown-it parse!

To add a test, add input markdown like $name.text in input/, make sure that $name.json is in tree/ (you can also put $name.commonmark.json there to test CM too), and finally you can run node script/regenerate-fixtures to populate those fixtures! Then it’s often easier to compare your changes with git diff.

scinos · 2020-04-13T10:30:11Z

Added a few tests for lists with mixed tabs and spaces.

I also run the tests in https://markdown-it.github.io/, https://spec.commonmark.org/dingus/ and printing the tree with unist-util-inspect (both with commonmark:false and comomnmark:true). The results are quite similar in all cases:

Before the fix, about half of the cases in the new test failed to parse properly. Example:

scinos · 2020-04-16T20:34:18Z

Any update on this? Is there anything else I should change in this PR?

wooorm · 2020-04-19T07:58:43Z

Released as 8.0.1!

transitive-bullshit · 2020-04-19T09:39:12Z

Amazing work @scinos!

sindresorhus · 2020-10-25T18:57:06Z

@scinos Just saw this. Thanks! You got the IssueHunt reward: https://issuehunt.io/u/scinos

Fixes lists with mixed indentation

e39d35f

Fixes remarkjs#198

scinos commented Apr 7, 2020

View reviewed changes

scinos mentioned this pull request Apr 7, 2020

Reindent markdon with spaces as a workaround for remarkjs/remark#198 Automattic/wp-calypso#40798

Closed

This comment has been minimized.

Sign in to view

ChristianMurphy requested review from a team April 7, 2020 18:11

Remove temporary directory

6c29c00

jaens mentioned this pull request Apr 12, 2020

Tab-indented nested lists get collapsed and escaped prettier/prettier#3223

Closed

Added tests for lists with mixed indentation

3bac549

wooorm approved these changes Apr 19, 2020

View reviewed changes

wooorm merged commit 0697d46 into remarkjs:master Apr 19, 2020

wooorm added remark-parse 🐛 type/bug This is a problem 👶 semver/patch This is a backwards-compatible fix 🗄 area/interface This affects the public interface labels Apr 19, 2020

wooorm mentioned this pull request Apr 19, 2020

Parse sub-lists indented with tabs #347

Closed

wooorm added the ⛵️ status/released label Apr 19, 2020

wooorm added the 💪 phase/solved Post is done label Aug 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes lists with mixed indentation #485

Fixes lists with mixed indentation #485

scinos commented Apr 7, 2020 •

edited

Loading

scinos Apr 7, 2020

scinos Apr 7, 2020

scinos Apr 7, 2020

scinos commented Apr 7, 2020

This comment has been minimized.

transitive-bullshit commented Apr 8, 2020

wooorm commented Apr 9, 2020

scinos commented Apr 13, 2020 •

edited

Loading

scinos commented Apr 16, 2020

wooorm commented Apr 19, 2020

transitive-bullshit commented Apr 19, 2020

sindresorhus commented Oct 25, 2020

Fixes lists with mixed indentation #485

Fixes lists with mixed indentation #485

Conversation

scinos commented Apr 7, 2020 • edited Loading

scinos Apr 7, 2020

Choose a reason for hiding this comment

scinos Apr 7, 2020

Choose a reason for hiding this comment

scinos Apr 7, 2020

Choose a reason for hiding this comment

scinos commented Apr 7, 2020

This comment has been minimized.

transitive-bullshit commented Apr 8, 2020

wooorm commented Apr 9, 2020

scinos commented Apr 13, 2020 • edited Loading

scinos commented Apr 16, 2020

wooorm commented Apr 19, 2020

transitive-bullshit commented Apr 19, 2020

sindresorhus commented Oct 25, 2020

scinos commented Apr 7, 2020 •

edited

Loading

scinos commented Apr 13, 2020 •

edited

Loading