Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MD013: Incorrect count on lines with multi-byte unicode characters #1458

Open
Maneren opened this issue Dec 26, 2024 · 1 comment
Open

MD013: Incorrect count on lines with multi-byte unicode characters #1458

Maneren opened this issue Dec 26, 2024 · 1 comment

Comments

@Maneren
Copy link

Maneren commented Dec 26, 2024

Hi, I copied a paragraph from a PDF and it contained hardcoded unicode italic characters which take 4 bytes in UTF-8 or 2 bytes in UTF-16. After pasting that to a markdown file and saving it in a file in UTF-8 encoding I started receiving Line length [Expected: 80, Actual: 85] warning, even though there are only 74 unicode characters displayed on the line (stored as 107 bytes).

- $\forall 𝑣_1, 𝑣_2, \ldots, 𝑣_𝑛 \in 𝑇_𝑛: 𝑣_1, 𝑣_2, \ldots, 𝑣_𝑛 \in 𝐾 \iff

(I assume the intention of the rule is to consider the "visual count of characters" as rendered in the editor - 74 in this case)

I may be missing some context or detail of the implementation but I think the issue is a combination of JS handling everything as UTF-16 rather than UTF-8 (that is the seemingly incorrect .length of the line reported) and the usage of regular "unicode-unaware" regular expressions, where . again matches on UTF-16 character.

So I think the correct way to handle these would be [...line].length to get the total length of the line and the inclusion of the u flag for the regular expressions to switch them to unicode mode.

@DavidAnson
Copy link
Owner

Related: #564

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants