Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Engram-es Q & U positions #47

Open
binarybottle opened this issue Nov 26, 2024 · 9 comments
Open

Engram-es Q & U positions #47

binarybottle opened this issue Nov 26, 2024 · 9 comments

Comments

@binarybottle
Copy link
Owner

The following was an email sent to me on 26 Nov 2024:

Hi,

I've been trying to optimize Russian layout and searching for scientific papers found your project. (IDK why Google Scholar doesn't show it on top. All I found was irrelevant.) Great that someone did a serious research paper and published a notebook, thanks for your work! I'll try running the code on my experimental layouts to see how it ranks them.

But there's a problem. I paid attention at your Spanish layout and tried to imagine typing in Spanish (I looked at the letters and moved the appropriate fingers imagining typing something like "y para entender lo que el ha propuesto...") I found out Q & U are placed very inconveniently: words like "que", "fue", "fuera", "puede", "aquel" require quite awkward moves between index finger and middle finger on the left hand.

My gut feeling told me they must be more outwards, because E/A/O often come after them as in the words I mentioned. So I calculated a metric that I used in my research: how many incoming & outgoing bigrams the letter has on the same hand. If bigram starts with the letter in question, it's "outgoing", if it ends, it's "incoming". I counted number of bigrams for your Spanish layout's left hand side. And turns out I was right: letter U wants to be much to the left of E/A/O. But "Q" wants to be even lefter.
You can run this code in your jp notebook after the code that defines bigrams frequencies:

import pandas as pd
d = pd.DataFrame({'bigram': bigrams, 'freq': bigram_frequencies})
d['l1'] = d.bigram.str[0]
d['l2'] = d.bigram.str[1]
left = ['A', 'E', 'I', 'O', 'U', 'Z', 'H', 'P', 'F', 'X', 'Q', 'Y']
d2 = d[d.l1.isin(left) & d.l2.isin(left)]
t2 = d2.groupby('l1').agg({'freq': 'sum'}).join(d2.groupby('l2').agg({'freq': 'sum'}), lsuffix='_out', rsuffix='_in').reset_index()
t2['delta'] = t2.freq_in - t2.freq_out
t2.sort_values('delta')

Output:

l1 freq_out freq_in delta
I 32120017 10431392 -21688625
P 24677708 5852291 -18825417
Q 12474918 1430018 -11044900
H 11195057 660160 -10534897
U 24643760 19815896 -4827864
F 7648289 2845112 -4803177
Z 3565688 3871776 306088
Y 2505993 2944683 438690
X 1207193 1762383 555190
O 3760162 24776548 21016386
A 7776678 31177035 23400357
E 8925375 34933544 26008169

In my reseach, I did use such queries in Pandas quite a lot, and optimized consciously, rather than use optimization algorithms. Another metric I had was "how much is the key connected with the keys in the same row" -- to see if some keys could be moved elsewhere, or should have stayed where they were.

I wonder why Q & U got there where they were? Is it because top row rf & pinky are penalized?

Anyway, many thanks for posting your code. I see I was going in the right direction.
Best regards,

Dmitri

@binarybottle
Copy link
Owner Author

Dmitri --

Thank you for reaching out with suggestions to improve engram-es! The approach relies on certain assumptions about ergonomics and on a representative corpus from which bigrams are derived. You write at a particularly good time, as I am revisiting this project from scratch and am developing a data-driven approach with crowdsourced information with a new software pipeline. I will keep your suggestion in mind as I progress in this project.

@culebron
Copy link

culebron commented Nov 27, 2024

Hi,
I'm happy to help. I improved my Jupyter notebook, annotating it like you did. Hopefully, it's understandable. The code that does bigrams score is very simple, search for get_bigram_cost function.

https://github.com/culebron/rus-layout-opt/blob/master/Russian%20Optimization.ipynb

@culebron
Copy link

I've updated my code by the link above, for more readability and configurability. Hope this helps.

@binarybottle
Copy link
Owner Author

Thank you, @culebron! I am taking a different, more data-driven approach this time, and will reach out for help if it comes back to scoring bigrams based on an algorithms or weighting scheme...

@binarybottle
Copy link
Owner Author

A follow-up email from @culebron on 27 Nov:

  1. Are we actually abusing the scoring systems?

When I made several initial layouts, and finally sat down and tested it typing real text (I added it to Ergodox and later to the OS to my laptop), I discovered a very inconvenient monogram (index finger in inner column upper row). Turned out, I hadn't a penalty for that and had abused this lacuna to get better scores.

I thought, maybe rules cast in stone aren't the way to go? Because they promote agressive optimizations and abuse of the rules. Maybe we should iterate with manual testing, discovering inconveniences, modify rules, and re-evaluate previous improvements?

On the other hand, it may lead to insanely complicated rules. This project (https://github.com/dariogoetz/keyboard_layout_optimizer) has 14 KINDS of rules! Configuration is enterprize-scale! The whole project is 8500 SLOC!!! Pure insanity and completely irreproducible. "Here, I got great layout" --"it's not scoring well on my system" -- "IDK, works for me."

Also, regarding manual testing -- probably as we optimize the layout, issues become less obvious and test subjects won't be able to point a finger at anything particular.

Hence...

  1. I contemplate an idea to implement hand mechanics. This will allow to calculate effort and all awkward positions automatically, without forgetting rules issues. (Though, bugs will be subtler.) I wrote general form formuale on paper, and the benefit I see is that we need only 9 costs. 6 for muscles and 3 for adjacent finger coordination. I speculate that this system should be less hyper-optimization-prone, and less driven by opinions.

@binarybottle
Copy link
Owner Author

From @culebron:

Check this out. I prefer to stick with arbitrary rules, but check layouts with hands, and also visualize them, like this.

Standard Russian layout. Key colors (viridis scale) = costs on particular keys. Arrow thickness = frequency, arrow color = price per press (price * freq = cost).
image.png (view on web)

Same for my last optimized layout:
image.png (view on web)

@binarybottle
Copy link
Owner Author

Cool, @culebron. What are you using to generate the visuals? I like these!

I've been generating images like this:
keyboard-bigram-typing-times

@culebron
Copy link

culebron commented Dec 18, 2024

@binarybottle I just sketched a visualization in QGIS, then reproduced it in Matplotlib/Pyplot code. It was tough.

@culebron
Copy link

culebron commented Dec 18, 2024

I actually improved upon this, making comparison images, where color and size scales are the same for several layouts. Here's Sholes' №1 (QWERTY), Sholes №2 (his last layout), Dvorak and Colemak.

Sholes №2 scores a bit worse than №1.
[ edit: moved the English layouts code to a separate notebook ]

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants