Engram-es Q & U positions #47

binarybottle · 2024-11-26T21:18:07Z

The following was an email sent to me on 26 Nov 2024:

Hi,

I've been trying to optimize Russian layout and searching for scientific papers found your project. (IDK why Google Scholar doesn't show it on top. All I found was irrelevant.) Great that someone did a serious research paper and published a notebook, thanks for your work! I'll try running the code on my experimental layouts to see how it ranks them.

But there's a problem. I paid attention at your Spanish layout and tried to imagine typing in Spanish (I looked at the letters and moved the appropriate fingers imagining typing something like "y para entender lo que el ha propuesto...") I found out Q & U are placed very inconveniently: words like "que", "fue", "fuera", "puede", "aquel" require quite awkward moves between index finger and middle finger on the left hand.

My gut feeling told me they must be more outwards, because E/A/O often come after them as in the words I mentioned. So I calculated a metric that I used in my research: how many incoming & outgoing bigrams the letter has on the same hand. If bigram starts with the letter in question, it's "outgoing", if it ends, it's "incoming". I counted number of bigrams for your Spanish layout's left hand side. And turns out I was right: letter U wants to be much to the left of E/A/O. But "Q" wants to be even lefter.
You can run this code in your jp notebook after the code that defines bigrams frequencies:

import pandas as pd
d = pd.DataFrame({'bigram': bigrams, 'freq': bigram_frequencies})
d['l1'] = d.bigram.str[0]
d['l2'] = d.bigram.str[1]
left = ['A', 'E', 'I', 'O', 'U', 'Z', 'H', 'P', 'F', 'X', 'Q', 'Y']
d2 = d[d.l1.isin(left) & d.l2.isin(left)]
t2 = d2.groupby('l1').agg({'freq': 'sum'}).join(d2.groupby('l2').agg({'freq': 'sum'}), lsuffix='_out', rsuffix='_in').reset_index()
t2['delta'] = t2.freq_in - t2.freq_out
t2.sort_values('delta')

Output:

l1	freq_out	freq_in	delta
I	32120017	10431392	-21688625
P	24677708	5852291	-18825417
Q	12474918	1430018	-11044900
H	11195057	660160	-10534897
U	24643760	19815896	-4827864
F	7648289	2845112	-4803177
Z	3565688	3871776	306088
Y	2505993	2944683	438690
X	1207193	1762383	555190
O	3760162	24776548	21016386
A	7776678	31177035	23400357
E	8925375	34933544	26008169

In my reseach, I did use such queries in Pandas quite a lot, and optimized consciously, rather than use optimization algorithms. Another metric I had was "how much is the key connected with the keys in the same row" -- to see if some keys could be moved elsewhere, or should have stayed where they were.

I wonder why Q & U got there where they were? Is it because top row rf & pinky are penalized?

Anyway, many thanks for posting your code. I see I was going in the right direction.
Best regards,

Dmitri

binarybottle · 2024-11-26T21:20:48Z

Dmitri --

Thank you for reaching out with suggestions to improve engram-es! The approach relies on certain assumptions about ergonomics and on a representative corpus from which bigrams are derived. You write at a particularly good time, as I am revisiting this project from scratch and am developing a data-driven approach with crowdsourced information with a new software pipeline. I will keep your suggestion in mind as I progress in this project.

culebron · 2024-11-27T10:24:06Z

Hi,
I'm happy to help. I improved my Jupyter notebook, annotating it like you did. Hopefully, it's understandable. The code that does bigrams score is very simple, search for get_bigram_cost function.

https://github.com/culebron/rus-layout-opt/blob/master/Russian%20Optimization.ipynb

culebron · 2024-11-28T11:25:46Z

I've updated my code by the link above, for more readability and configurability. Hope this helps.

binarybottle · 2024-11-29T22:46:00Z

Thank you, @culebron! I am taking a different, more data-driven approach this time, and will reach out for help if it comes back to scoring bigrams based on an algorithms or weighting scheme...

binarybottle · 2024-12-02T17:01:39Z

A follow-up email from @culebron on 27 Nov:

Are we actually abusing the scoring systems?

When I made several initial layouts, and finally sat down and tested it typing real text (I added it to Ergodox and later to the OS to my laptop), I discovered a very inconvenient monogram (index finger in inner column upper row). Turned out, I hadn't a penalty for that and had abused this lacuna to get better scores.

I thought, maybe rules cast in stone aren't the way to go? Because they promote agressive optimizations and abuse of the rules. Maybe we should iterate with manual testing, discovering inconveniences, modify rules, and re-evaluate previous improvements?

On the other hand, it may lead to insanely complicated rules. This project (https://github.com/dariogoetz/keyboard_layout_optimizer) has 14 KINDS of rules! Configuration is enterprize-scale! The whole project is 8500 SLOC!!! Pure insanity and completely irreproducible. "Here, I got great layout" --"it's not scoring well on my system" -- "IDK, works for me."

Also, regarding manual testing -- probably as we optimize the layout, issues become less obvious and test subjects won't be able to point a finger at anything particular.

Hence...

I contemplate an idea to implement hand mechanics. This will allow to calculate effort and all awkward positions automatically, without forgetting rules issues. (Though, bugs will be subtler.) I wrote general form formuale on paper, and the benefit I see is that we need only 9 costs. 6 for muscles and 3 for adjacent finger coordination. I speculate that this system should be less hyper-optimization-prone, and less driven by opinions.

binarybottle · 2024-12-11T20:37:16Z

From @culebron:

Check this out. I prefer to stick with arbitrary rules, but check layouts with hands, and also visualize them, like this.

Standard Russian layout. Key colors (viridis scale) = costs on particular keys. Arrow thickness = frequency, arrow color = price per press (price * freq = cost).
image.png (view on web)

Same for my last optimized layout:
image.png (view on web)

binarybottle · 2024-12-11T20:38:12Z

Cool, @culebron. What are you using to generate the visuals? I like these!

I've been generating images like this:

culebron · 2024-12-18T07:52:30Z

@binarybottle I just sketched a visualization in QGIS, then reproduced it in Matplotlib/Pyplot code. It was tough.

culebron · 2024-12-18T08:15:05Z

I actually improved upon this, making comparison images, where color and size scales are the same for several layouts. Here's Sholes' №1 (QWERTY), Sholes №2 (his last layout), Dvorak and Colemak.

Sholes №2 scores a bit worse than №1.
[ edit: moved the English layouts code to a separate notebook ]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Engram-es Q & U positions #47

Engram-es Q & U positions #47

binarybottle commented Nov 26, 2024

binarybottle commented Nov 26, 2024

culebron commented Nov 27, 2024 •

edited

Loading

culebron commented Nov 28, 2024

binarybottle commented Nov 29, 2024

binarybottle commented Dec 2, 2024

binarybottle commented Dec 11, 2024

binarybottle commented Dec 11, 2024

culebron commented Dec 18, 2024 •

edited

Loading

culebron commented Dec 18, 2024 •

edited

Loading

Engram-es Q & U positions #47

Engram-es Q & U positions #47

Comments

binarybottle commented Nov 26, 2024

binarybottle commented Nov 26, 2024

culebron commented Nov 27, 2024 • edited Loading

culebron commented Nov 28, 2024

binarybottle commented Nov 29, 2024

binarybottle commented Dec 2, 2024

binarybottle commented Dec 11, 2024

binarybottle commented Dec 11, 2024

culebron commented Dec 18, 2024 • edited Loading

culebron commented Dec 18, 2024 • edited Loading

culebron commented Nov 27, 2024 •

edited

Loading

culebron commented Dec 18, 2024 •

edited

Loading

culebron commented Dec 18, 2024 •

edited

Loading