Better font subset detection #982

m4rc1e · 2024-06-10T13:07:18Z

We currently detect script subsets such as Arabic by counting the number of glyphs in the cmap table and then seeing if greater than 50% of them are in a specific script subset .nam file. Instead, what if we simply checked if a font fulfills a gflanguage base charset e.g for Arabic, we'd need all the base characters in https://github.com/googlefonts/lang/blob/main/Lib/gflanguages/data/languages/ar_Arab.textproto#L44?

simoncozens · 2024-06-10T13:23:59Z

That sounds a lot better, although we need to think it through:

A script like Arabic is used to write multiple languages - we don't have a concept of which is the "primary" language for the script at the moment, so do we require a superset of all of them?
Perhaps not, because a font which supports Arabic shouldn't be skipped just because it doesn't contain Uyghur letter ۇ, and a font can support Latin even if it doesn't have Ᵽ.
So are we back to looking at support for a certain percentage of covered characters within all languages of a script?
Or can we require the intersection of all of the bases for all languages for a script? Nice idea, but the intersection for Latin is the null set - there are no characters which are an exemplar for every Latin language!

I can see two approaches:

YOLO. If a font contains any character in a subset, it gets that subset. Anything less than that and you're removing glyphs from a font, and why do we want to do that? Or
Compute the supported languages first, and then declare support for all of their scripts.

m4rc1e · 2024-06-10T13:29:11Z

Compute the supported languages first, and then declare support for all of their scripts.

I like this a lot. gflanguages also includes population count data. Perhaps instead of counting glyphs, counting how many people you're able to cover may be better.

m4rc1e mentioned this issue Jun 27, 2024

Add subsets check to PR checklist #978

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better font subset detection #982

Better font subset detection #982

m4rc1e commented Jun 10, 2024

simoncozens commented Jun 10, 2024

m4rc1e commented Jun 10, 2024

Better font subset detection #982

Better font subset detection #982

Comments

m4rc1e commented Jun 10, 2024

simoncozens commented Jun 10, 2024

m4rc1e commented Jun 10, 2024