Feature request: aggregate of non-ASCII/system file/directory characters #100

kieranjol · 2022-09-21T07:57:22Z

I'm enjoying digging into the lists of non-ASCII and troublesome characters. I think it could be useful to show an aggregate of the characters that appear in a report, and how often they appear. For example:

 characters outside of ASCII range: '0xc9, LATIN CAPITAL LETTER E WITH ACUTE: É' (248)
 characters outside of ASCII range: '0xf028, None: ' (4)
 non-recommended character: '0x5b, LEFT SQUARE BRACKET: [' (1474)

It could be useful so that during appraisal, if I know that latin letters with acutes are supported within the repository, but perhaps bullets or other characters are not, then it would speed up the process of identifying problematic characters.

The text was updated successfully, but these errors were encountered:

ross-spencer · 2023-09-10T19:42:59Z

Sorry it took a while to get back to this. It's a good suggestion. I need some sample data so finally got round to creating a proper repo for some of my other test work here: https://github.com/ross-spencer/big-list-of-naughty-files which generates a lot of output that will appear in these kinds of aggregates. Unfortunately it breaks a few more things so I'll try and fix those first then add some more sample data to this issue to create the aggregates.

ross-spencer added enhancement a feature by any other name In-progress this is actively being worked on labels Sep 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: aggregate of non-ASCII/system file/directory characters #100

Feature request: aggregate of non-ASCII/system file/directory characters #100

kieranjol commented Sep 21, 2022

ross-spencer commented Sep 10, 2023

Feature request: aggregate of non-ASCII/system file/directory characters #100

Feature request: aggregate of non-ASCII/system file/directory characters #100

Comments

kieranjol commented Sep 21, 2022

ross-spencer commented Sep 10, 2023