Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

“What happens if I remove ...?” #22

Open
justanotherfoundry opened this issue Jun 22, 2024 · 10 comments
Open

“What happens if I remove ...?” #22

justanotherfoundry opened this issue Jun 22, 2024 · 10 comments

Comments

@justanotherfoundry
Copy link

Thanks for this really helpful tool! I have used quite a lot in the last few days to check and refine my character set.

Many years ago, I wrote a script that does something similar but checks against the Unicode CLDR, more specifically, the way Font Book on macOS determines the supported languages (which seems to be based on the CLDR).

I just uploaded the code:
https://github.com/justanotherfoundry/font-production/tree/master/import%20CLDR
and
https://github.com/justanotherfoundry/freemix-glyphsapp/blob/master/Font%20Book%20Checker.py

As you can see, this is a very similar approach.

My script also determines characters that are not required in any of the supported languages. In other words, deleting these characters would not change the list of supported languages. This is a good way of finding useless characters in the character set as they are used only in languages that are not fully supported anyway. (I believe most of the fonts out there have a lot of these useless characters.)

Before simply relying on the CLDR, and removing all these characters, I’d like to check what TalkingLeaves and Hyperglot say about them (I don’t have much of an opinion (yet) which of the two is more correct). So, what happens if I remove this or that character? Will the list of supported languages as per TalkingLeaves change? At the moment, checking this is very tedious: Essentially, I need to go through the list of incomplete languages and try to spot the languages where all missing characters are among the ones I just deleted.

Would it be possible to have TalkingLeaves output the list as text in the Macro Panel? Then I could simply start TalkingLeaves, copy the output, delete some characters, start TalkingLeaves again, and do a text diff. If nothing changes then the deleted characters were indeed useless as per CLDR as well as Hyperglot.

As a side note, it would be nice to have a palette that gives information on the currently selected glyph (or glyphs): which languages it is required for according to Hyperglot and according to the CLDR, plus the number of speakers, and whether these languages are complete or not. Plus, maybe a Wikipedia link. Then I could make up my mind whether to keep or remove the glyphs, one by one. Maybe I will write something like that at some point.

@justinpenner
Copy link
Owner

I love that palette idea! It would be interesting to see what languages require the selected character(s). A lot of Unicode characters have a Wikipedia article so that could work, too.

I don't know much about the CLDR yet, so I definitely need to dig into that and see how it might be useful for TalkingLeaves. The update I pushed yesterday adds a data.py module which lays some of the groundwork for integrating more data sources like Shaperglot and CLDR. It'll be some work to figure out the best ways to merge the data together and deal with differences when, for example, Hyperglot and Shaperglot may have slightly different orthography definitions for the same language.

I think once I've begun integrating multiple data sources beyond just Hyperglot, then it would be useful to work on making TalkingLeaves more usable as an API, for users who want to write scripts. For now, you can already write scripts that import it as a module, with the big caveat that your scripts might break when TalkingLeaves is updated.

Here's an example of how you could print a list of chars in the font that aren't used by any languages that your font has completed:

from TalkingLeaves.data import Data
from TalkingLeaves.utils import flatten

data = Data()

# Don't need this table, but it generates data.completeLangs
_ = data.langsAsTable('Latin', Glyphs.font, True, True)

completeNames = list(data.completeLangs.loc[:, 'name'])
completeLangs = data.langs[data.langs['name'].isin(completeNames)]
completeCharsets = list(completeLangs.loc[:, 'chars'])
completeChars = set(flatten(completeCharsets))
fontChars = set(g.string for g in Glyphs.font.glyphs)

# Unneeded chars
print(sorted(fontChars - completeChars))

From there you may want to filter out punctuation and symbols, since Hyperglot's orthography definitions only cover letters and marks.

@jenskutilek
Copy link

As a side note, it would be nice to have a palette that gives information on the currently selected glyph (or glyphs): which languages it is required for according to Hyperglot and according to the CLDR, plus the number of speakers, and whether these languages are complete or not. Plus, maybe a Wikipedia link. Then I could make up my mind whether to keep or remove the glyphs, one by one. Maybe I will write something like that at some point.

I wrote a plugin that shows such a palette for the current glyph, also based on Unicode CLDR: https://github.com/jenskutilek/UnicodeInfo-Glyphs

screenshot

Feel free to reuse/adapt parts if you need any :)

@justanotherfoundry
Copy link
Author

@justinpenner Thanks! That looks really promising. However, I am getting a beach ball when I run this code. Also, the regular TalkingLeaves now crashes. I will submit this as a new issue.

@justanotherfoundry
Copy link
Author

@jenskutilek Aha, I knew someone must have had this kind of idea. If I will pursue it any further I’ll surely build on your plugin. Thanks!

@justanotherfoundry
Copy link
Author

I played around with the “unneeded characters” script above. The list includes the Cyrillic Ѐ, which is, according to Hyperglot, required for Bulgarian, and Bulgarian is complete in my font. If I delete this glyph from the font, TalkingLeaves shows Bulgarian as incomplete with only this character missing. This means the list of “unneeded characters” reported by the script is not what I would expect it to be. I am looking for characters I can remove without reducing the list of complete languages.

@justinpenner
Copy link
Owner

Does it work if you change the first argument in this line to Cyrillic instead of Latin?

_ = data.langsAsTable('Latin', Glyphs.font, True, True)

If you already tried that, then I'm not sure what's going wrong. It works correctly for me if I create a font and add Bulgarian to it via TalkingLeaves. The script reports no unneeded Cyrillic chars for me:

[' ', ',', '-', '.', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']

@justanotherfoundry
Copy link
Author

Does it work if you change the first argument in this line to Cyrillic instead of Latin?

It does! Seems like we’d have to loop over several scripts? Sorry, I don’t know enough about pandas so I cannot debug the code.

@justinpenner
Copy link
Owner

Seems like we’d have to loop over several scripts? Sorry, I don’t know enough about pandas so I cannot debug the code.

Exactly, you can loop over multiple scripts like this, and you don't need to add any new pandas code:

from TalkingLeaves.data import Data
from TalkingLeaves.utils import flatten

data = Data()

# Don't need this table, but it generates data.completeLangs
completeNames = []
for script in ['Latin', 'Cyrillic']:
	_ = data.langsAsTable(script, Glyphs.font, True, True)
	completeNames.extend(list(data.completeLangs.loc[:, 'name']))

completeLangs = data.langs[data.langs['name'].isin(completeNames)]
completeCharsets = list(completeLangs.loc[:, 'chars'])
completeChars = set(flatten(completeCharsets))
fontChars = set(g.string for g in Glyphs.font.glyphs)

# Unneeded chars
print(sorted(fontChars - completeChars))

By the way, I found pandas surprisingly easy to learn. It's very Pythonic and I found it easier to understand after looking up a "cheat sheet" of common commands. I've only scratched the surface so far, but it didn't take long to learn the basics so I could plug it in to TalkingLeaves.

@justanotherfoundry
Copy link
Author

@jenskutilek I implemented what I described above, on the basis of your Unicode Info plug-in: https://github.com/justanotherfoundry/UnicodeInfo-Glyphs Did you get my e-mail?

@jenskutilek
Copy link

@justanotherfoundry it took me a while to answer, sorry for that! I hope you got my reply by now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants