Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better font subset detection #982

Open
m4rc1e opened this issue Jun 10, 2024 · 2 comments
Open

Better font subset detection #982

m4rc1e opened this issue Jun 10, 2024 · 2 comments

Comments

@m4rc1e
Copy link
Collaborator

m4rc1e commented Jun 10, 2024

We currently detect script subsets such as Arabic by counting the number of glyphs in the cmap table and then seeing if greater than 50% of them are in a specific script subset .nam file. Instead, what if we simply checked if a font fulfills a gflanguage base charset e.g for Arabic, we'd need all the base characters in https://github.com/googlefonts/lang/blob/main/Lib/gflanguages/data/languages/ar_Arab.textproto#L44?

@simoncozens
Copy link
Contributor

That sounds a lot better, although we need to think it through:

  • A script like Arabic is used to write multiple languages - we don't have a concept of which is the "primary" language for the script at the moment, so do we require a superset of all of them?
  • Perhaps not, because a font which supports Arabic shouldn't be skipped just because it doesn't contain Uyghur letter ۇ, and a font can support Latin even if it doesn't have Ᵽ.
  • So are we back to looking at support for a certain percentage of covered characters within all languages of a script?
  • Or can we require the intersection of all of the bases for all languages for a script? Nice idea, but the intersection for Latin is the null set - there are no characters which are an exemplar for every Latin language!

I can see two approaches:

  • YOLO. If a font contains any character in a subset, it gets that subset. Anything less than that and you're removing glyphs from a font, and why do we want to do that? Or
  • Compute the supported languages first, and then declare support for all of their scripts.

@m4rc1e
Copy link
Collaborator Author

m4rc1e commented Jun 10, 2024

Compute the supported languages first, and then declare support for all of their scripts.

I like this a lot. gflanguages also includes population count data. Perhaps instead of counting glyphs, counting how many people you're able to cover may be better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants