MD013: Incorrect count on lines with multi-byte unicode characters #1458

Maneren · 2024-12-26T01:59:02Z

Hi, I copied a paragraph from a PDF and it contained hardcoded unicode italic characters which take 4 bytes in UTF-8 or 2 bytes in UTF-16. After pasting that to a markdown file and saving it in a file in UTF-8 encoding I started receiving Line length [Expected: 80, Actual: 85] warning, even though there are only 74 unicode characters displayed on the line (stored as 107 bytes).

- $\forall 𝑣_1, 𝑣_2, \ldots, 𝑣_𝑛 \in 𝑇_𝑛: 𝑣_1, 𝑣_2, \ldots, 𝑣_𝑛 \in 𝐾 \iff

(I assume the intention of the rule is to consider the "visual count of characters" as rendered in the editor - 74 in this case)

I may be missing some context or detail of the implementation but I think the issue is a combination of JS handling everything as UTF-16 rather than UTF-8 (that is the seemingly incorrect .length of the line reported) and the usage of regular "unicode-unaware" regular expressions, where . again matches on UTF-16 character.

So I think the correct way to handle these would be [...line].length to get the total length of the line and the inclusion of the u flag for the regular expressions to switch them to unicode mode.

The text was updated successfully, but these errors were encountered:

DavidAnson · 2024-12-26T02:28:58Z

Related: #564

DavidAnson added the enhancement label Dec 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MD013: Incorrect count on lines with multi-byte unicode characters #1458

MD013: Incorrect count on lines with multi-byte unicode characters #1458

Maneren commented Dec 26, 2024 •

edited

Loading

DavidAnson commented Dec 26, 2024

MD013: Incorrect count on lines with multi-byte unicode characters #1458

MD013: Incorrect count on lines with multi-byte unicode characters #1458

Comments

Maneren commented Dec 26, 2024 • edited Loading

DavidAnson commented Dec 26, 2024

Maneren commented Dec 26, 2024 •

edited

Loading