-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathatom.xml
6948 lines (5286 loc) · 187 KB
/
atom.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en-us">
<generator uri="https://gohugo.io/" version="0.59.0-DEV">Hugo</generator><title type="html"><![CDATA[Albert De La Fuente's site]]></title>
<subtitle type="html"><![CDATA[The tagline.]]></subtitle>
<link href="https://albertdelafuente.com/" rel="alternate" type="text/html" title="HTML" />
<link href="https://albertdelafuente.com/index.xml" rel="alternate" type="application/rss+xml" title="RSS" />
<link href="https://albertdelafuente.com/atom.xml" rel="self" type="application/atom+xml" title="Atom" />
<link href="https://albertdelafuente.com/jf2feed.json" rel="alternate" type="application/jf2feed+json" title="jf2feed" />
<updated>2025-03-09T21:28:40-03:00</updated>
<author>
<name>Albert</name>
<email>[email protected]</email>
</author>
<id>https://albertdelafuente.com/</id>
<entry>
<title type="html"><![CDATA[My thoughts on privacy and security]]></title>
<link href="https://albertdelafuente.com/posts/privacy/?utm_source=atom_feed" rel="alternate" type="text/html" />
<link href="https://albertdelafuente.com/books/20220109212603-book_notes_permanent_record_by_edward_snowden/?utm_source=atom_feed" rel="related" type="text/html" title="Book notes: Permanent record by Edward Snowden" />
<id>https://albertdelafuente.com/posts/privacy/</id>
<author>
<name>Albert De La Fuente Vigliotti</name>
</author>
<published>2023-12-10T09:46:33-03:00</published>
<updated>2023-12-10T09:46:33-03:00</updated>
<content type="html"><![CDATA[
<p>A good friend of mine motivated me to write down this post. I am not sure how I
am going to address this in a “short” way since security and privacy is such a
vast topic. It can be taken to several levels also, so it is a matter of how you
actually compromise in favor of being usable and practical, since the power in
place make it difficult on purpose to have open alternatives that work.</p>
<p>I am going to do my best to divide this in different topics, <strong>keep it short and
non-technical</strong>. It is going to be challenging.</p>
<h2 id="desktop">Desktop</h2>
<p>GNU/Linux is definitively the way to go. Personally I don’t trust Microsoft nor
Apple. In general I am an advocate of open source.</p>
<p>Personally I have been using GNU/Linux since around 2000, and I am extremely
comfortable with it. I understand that it could be a challenge where we still
don’t have alternatives (or high quality tools) to some specific software like
CAD. But at the same time Linux have better tools in many other aspects.</p>
<p>There are different “distros” of GNU/Linux. A distro (or distribution) is the
mixture of the tools from GNU, the Linux kernel and a way to manage packages.
This is a very simplistic definition, but I don’t want to get too technical
here.</p>
<p>Distributions for the beginners:</p>
<ul>
<li><a href="https://ubuntu.com/">Ubuntu</a></li>
<li><a href="https://www.opensuse.org/">OpenSuSE</a></li>
<li><a href="https://fedoraproject.org/">Fedora</a></li>
<li><a href="https://linuxmint.com/">Linux Mint</a> (never tried it)</li>
<li><a href="https://manjaro.org/">Manjaro</a> (never tried it)</li>
</ul>
<p>Distributions for more mature users:</p>
<ul>
<li><a href="https://www.debian.org/">Debian</a></li>
<li><a href="https://archlinux.org/">Arch Linux</a></li>
<li><a href="https://www.gentoo.org/">Gentoo</a> (never tried it)</li>
</ul>
<h2 id="mobile">Mobile</h2>
<p>Between IPhone and Android, I will always chose Android. The reason being that
Android has a hybrid open/close source development model. It is possible to get
the Vanilla version of Android without google play (actually nothing from
google) and then you can install an alternative to google play like F-Droid or
Aurora Store.</p>
<p>I don’t completely trust Aurora Store because it is a front end of Google Play
itself, and software could have backdoors but I still think it could be a better
alternative than have a phone that you paid for but Google owns.</p>
<p>Just like GNU/Linux, there are several distros. These can com “with gapps” or
“without gapps”. Gapps being “Google Apps”, in other words, Google Play, Gmail,
Google maps, etc. Without gapps is the way I prefer to go personally.</p>
<p>There are several custom roms like:</p>
<ul>
<li>Lineage</li>
<li>Resurrection Remix</li>
<li><a href="https://calyxos.org/features/">CalyxOS</a></li>
<li><a href="https://grapheneos.org/features">GrapheneOS</a></li>
</ul>
<p>The most secure focus are Graphene and Calyx. Again, privacy and security is a
deep rabbit hole. As deep as you can afford to go.</p>
<p>Personally I have been using Lineage for years and some custom mods. Every
single Android device that I owned, I flashed and removed the original Android
and replace it with a custom rom without Google Apps (nothing from Google). I
have been doing this since around 2006. I am very curious to try Calyx though.
If I would be starting from scratch, I would go for Calyx.</p>
<p>I have a Mediatek chipset on my phone and I hated it! That was definetively a
bad purchase for this purposes. Don’t get me wrong, the phone is great, but it
is not easier to flash a custom ROM. I would go for Qualcomm chipset instead and
make sure that the bootloader can be changed. <a href="https://www.devicespecifications.com/en/model-cpu/47075c08">This Ulefone Armor 23 looks
promising</a>. Apparently the 24 will also come with a Qualcom chipset.</p>
<h2 id="network">Network</h2>
<p>When in doubt, use a VPN or TOR circuits. In this way your traffic remains more
secure. You can get a router that has that capability already built in, so you
can route everything through that device.</p>
<p>Always use a firewall and disable the services you don’t use. Use strong passwords.</p>
<p>If you are looking for an appliance that can do all of that you will need
DD-WRT, pfSense or similar if you want a DIY approach and you are techie enough
or go for a similar commercial solution like <a href="https://www.youtube.com/watch?v=pdS1n_F11Dk">the BraxRouter</a> (I am not affiliated
in any way, and I haven’t tested it personally so be cautious and do your own
research - DYOR).</p>
<h2 id="authentication">Authentication</h2>
<p>I prefer avoiding using SMS authentication, I don’t trust the telephony system.
They have too much power and knowledge about their users.</p>
<p>I also avoid the Google prompts. Those who appears when you are trying to login
on your phone while you are trying to login on your computer. The reason is
simple, they tie your identity on the computer with your phone, because you will
have to “click” on your phone. All traffic will be related.</p>
<p>I don’t use Google Authenticator nor Authy, or any Android app for that matter
unless it is opensource. Personally I use one on my computer, this brings extra
security and inconveniences. If I am not with my computer, I cannot login
anywhere, even with my phone. I decided to live with that inconvenience.</p>
<p>Use 2FA (Second Factor Authentication) whenever possible, but with OTP (One TIme
Password) for that. I prefer free TOTP software, the caveat to this approach is
that you will have to handle your secrets yourself in a secure way, so you don’t
lose them and nobody else has access to them. Please don’t even think about
putting that into Dropbox!</p>
<p>I am a fairly technical guy so I prefer to keep the secrets under my control
having encrypted backups than “trusting” any other party.</p>
<p>I am not going to disclose publicly here what I use for security reasons. But if
you know me personally and have questions ask me one on one and I will share
more about this.</p>
<p>If you want more information, do your own research. I can point you to <a href="https://www.youtube.com/watch?v=ChKpf5HjcSY">this video</a>,
as a quick overview. I think the title is a bit missleading though, I trust 2FA
with TOTP where the secrets are encrypted and properly backed up.</p>
<h2 id="im-communications">IM communications</h2>
<p>Signal or no signal… that is the question…</p>
<p>I have been a huge fan of Telegram. At the beginning, you did not needed to use
a phone number, it was just an account you created (and you could create as many
as you wanted) and that was it. No verification and no ties to your identity.
Sadly, now they “require” a phone number, which is by design IMO. So I no longer
trust Telegram. The same argument goes for Signal.</p>
<p>I am aware of Signal’s history, even before it was signal and the original
developers, but again… I don’t trust supplying my phone number to any app for
any reason. There is no need for that, none.</p>
<p>I have always hated Whatsapp, security is crappy, it belongs to Meta. Huge user
base… So many reasons that I wont even go there…</p>
<p>My preferred method for IM is XMPP which is federated in nature, so “no central
points” (again, loosely said). I would prefer to pick up a server that is not
crowded, it is in a country outside of <a href="https://en.wikipedia.org/wiki/Five_Eyes">the 5 eyes</a> <sup class="footnote-ref" id="fnref:fn-1"><a href="#fn:fn-1">1</a></sup>.</p>
<p>In terms of privacy, OMEMO would be preferred, or asymmetric cryptography with
GPG/PGP, or OTR (Off The Record) to say the least.</p>
<h2 id="email-communications">Email communications</h2>
<p>This is a topic that I struggled with for some time.</p>
<p>Bottom line is I don’t trust large providers, because they are an easy target. Out</p>
<p>I wrote a note, which is not publicly available analyzing the email providers by
April of 2022. Before that I was using a really old google apps account that I
created around 2005. It was very convenient, I am not going to lie, but I never
felt good about it so it was good that Google decided to kick me out by charging
me. If I have to pay, I prefer to pay for something more friendly towards
privacy than to them.</p>
<p>Here is a table with a summary of some of the criteria that I used to analyze
those providers.</p>
<table>
<thead>
<tr>
<th>Provider</th>
<th>Domains</th>
<th>IMAP</th>
<th>Location</th>
<th>EAR</th>
<th>Crypto pay</th>
<th>App</th>
<th>Alias</th>
<th>Storage</th>
<th>Price/m</th>
<th>2FA</th>
<th>Cons</th>
</tr>
</thead>
<tbody>
<tr>
<td>CounterMail</td>
<td>15$</td>
<td>yes</td>
<td><strong>Sweden</strong></td>
<td>yes</td>
<td>BTC</td>
<td>no</td>
<td>inf</td>
<td>4G</td>
<td>~5</td>
<td>yes</td>
<td><strong>sweden</strong></td>
</tr>
<tr>
<td>Mailbox.org</td>
<td>50?</td>
<td>yes</td>
<td><strong>Germany</strong></td>
<td>yes</td>
<td>no?</td>
<td>no</td>
<td><sup>25</sup>⁄<sub>50</sub></td>
<td>10G</td>
<td>3</td>
<td>yes</td>
<td><strong>meta exposed, tracking</strong></td>
</tr>
<tr>
<td>Runbox - Mini</td>
<td>5</td>
<td>yes</td>
<td><span class="underline">Norway</span></td>
<td><strong>no</strong></td>
<td>BTC</td>
<td>no</td>
<td>100</td>
<td>10G</td>
<td>2.91</td>
<td></td>
<td><strong>no EAR</strong></td>
</tr>
<tr>
<td>Protonmail - Plus/Pro</td>
<td>10/*1*</td>
<td>bridge</td>
<td><span class="underline">Switzerland</span></td>
<td>yes</td>
<td></td>
<td>yes</td>
<td>5</td>
<td>5G/5G</td>
<td><sup>8</sup>⁄<sub>5</sub></td>
<td></td>
<td><strong>visibility, suspicious</strong></td>
</tr>
<tr>
<td>Mailfence - Pro/Entry</td>
<td><sup>5</sup>⁄<sub>1</sub></td>
<td>yes</td>
<td><strong>Belgium</strong></td>
<td><strong>no</strong></td>
<td>BTC/ <span class="underline">LTC</span></td>
<td>yes?</td>
<td><sup>50</sup>⁄<sub>10</sub></td>
<td>20G/5G</td>
<td>7.<sup>5</sup>⁄<sub>2</sub>.5</td>
<td></td>
<td><strong>not EAR</strong></td>
</tr>
<tr>
<td>Posteo+</td>
<td>0</td>
<td>yes</td>
<td><strong>Germany</strong></td>
<td>yes</td>
<td>no</td>
<td>?</td>
<td>?</td>
<td></td>
<td></td>
<td></td>
<td><strong>no domains</strong></td>
</tr>
<tr>
<td>Ctemplar - Knight</td>
<td>5</td>
<td><strong>no</strong></td>
<td><span class="underline">Iceland</span></td>
<td>yes</td>
<td><span class="underline">XMR</span> /BTC</td>
<td>yes?</td>
<td>30</td>
<td>10G</td>
<td><strong>12</strong></td>
<td>yes</td>
<td><strong>no IMAP</strong></td>
</tr>
<tr>
<td>Tutanota</td>
<td></td>
<td><strong>no</strong></td>
<td><strong>Germany</strong></td>
<td>yes</td>
<td>no</td>
<td>yes</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td><strong>own encryption, no IMAP</strong></td>
</tr>
<tr>
<td>Fastmail Standard</td>
<td>100</td>
<td></td>
<td><strong>Australia</strong></td>
<td></td>
<td>no?</td>
<td></td>
<td></td>
<td>30G</td>
<td>5</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
<p>Personally I liked Ctemplar, but I wanted IMAP since I use Emacs/mu4e for email
management and the possibility of keeping a local copy of my emails if needed.
I also use my phone to read emails when needed. So I had to discard them.
Depending on your use you may not need IMAP.</p>
<p>Honestly I am still not comfortable with my email provider, but I am better than
at Google’s.</p>
<p>In regards to usage, GPG is always preferred but of course it depends on both
parties using it. Be mindful that RSA 2048 is no longer secure, governments can
break into it. Probably not regular people. So I would go for 4096 bit keys.</p>
<p>I am not going to disclose publicly here what I use for security reasons. But if
you know me personally and have questions ask me one on one and I will share
more about this. I know that if you are a technical person you can find this
yourself.</p>
<h2 id="conferences">Conferences</h2>
<p>I don’t trust Zoom. Do your own research here. One of the founders was an
ex-Cisco systems. Again… do your own research.</p>
<p>I would go for Jitsi or something is open source. Of course a case could be done
that there is always a server as middle man, to which yes, I agree. At this
point I don’t know of any better alternatives, Jitsi is the way I would go or
Signal’s video chat feature. Even though I don’t trust much Signal either.</p>
<h2 id="browsing">Browsing</h2>
<p>I don’t use Google Chrome, even less Microsoft Edge. I prefer the old Firefox.</p>
<p>As part of my setup I don’t save cookies when I close the browser nor history. I
always prefer private mode when possible and I use a bunch of extensions to make
my unique fingerprint not so obvious.</p>
<p>I avoid using google whenever possible. I don’t trust much Duckduck go either.
Search engines are the gate keepers of the internet, sadly there is no integrity
in this business. I have read good things about Qwant.</p>
<p>I would advice to also disable Javascript by default by using No Script or
similar. As using other extensions to make your tracking less obvious, even
though I don’t think there is much to do here. Try to always use VPNs or TOR
circuits. Be aware that having a high level of privacy/security could make your
browsing experience miserable!</p>
<h2 id="money">Money</h2>
<p>Well, this is going to be controversial. We don’t use money, we use fiat
currency which is controlled by the governments and manipulated via inflation
and taxes as they want. It is hard to get out of that. Cash is always preferred
but it is not convenient, so balancing things out is up to the reader.</p>
<p>Precious metals could be an alternative but probably everything is going to get
digitized in the future, even precious metals probably with certificates on the
blockchain as Colombia is already issuing for real estate. So I don’t have good
news here.</p>
<p>Probably having some money in a private crypto currency like ZCash, Monero,
Verge or others could be a good idea. Be mindful of the fluctuations and the
risks about it as the tax compliance.</p>
<p>Also be mindful that you should aim for a peer to peer market and that could be
dangerous also, so you will have to “trust” the network somehow. If you use an
exchange it defeats the purpose of privacy.</p>
<p>If you go for a private crypto, it is highly likely that you can run a local
node. You can use a small device for such like a Raspberry Pi or a refurbished
mini-desktop or notebook. You will have to setup the service and get a full copy
of the block chain.</p>
<p>Honestly I would hedge for land, food security and water rather than save money.
But that is just my opinion.</p>
<h2 id="emergency-communications">Emergency communications</h2>
<p>This is going to be controversial also. I am still learning about radio but I
would definitively have a Ham radio and a rapport team, otherwise the radio
alone is useless. Community is extremely important.</p>
<p>I prefer not to touch much on this subject. Do your own research. Maybe the
Ghost network could help you. Check out <a href="https://www.youtube.com/watch?v=1oaWRs2te68">this video on the Lilygo T-Deck device
with the Meshtastic software for Encrypted Comms</a>. And this video on <a href="https://www.youtube.com/watch?v=EAQI2ZSmxPU">Meshtastic
and LoRa devices</a> for general knowledge. I am fairly new to this, so do your
own research. Not having to use a phone would be preferred IMO, so it is not
tied to the IMEI number, MAC address, IP address and so forth.</p>
<h2 id="conclusion-and-closing-remarks">Conclusion and closing remarks</h2>
<p>Remember, favor VPNs or TOR circuits, handle your secrets yourself in a secure
and reliable way.</p>
<p>Don’t trust services where you need to supply personal data. Prefer services
that offer alternative payment methods in crypto also.</p>
<p>As I said in the beginning, privacy and security is a rabbit hole that can go
very deep, even for the technical guy. So I am trying to just give an overview.</p>
<p>If there is a category that I forgot about that you would like to see here, send
me a message.</p>
<div class="footnotes">
<hr />
<ol>
<li id="fn:fn-1"><p>For more on these “eyes” I suggest you to read: “Permanent record” by
Edward Snowden and/or “No place to hide” by Gleen Greenwald.</p>
<a class="footnote-return" href="#fnref:fn-1"><sup>[return]</sup></a></li>
</ol>
</div>
]]></content>
<category scheme="https://albertdelafuente.com/categories/privacy" term="privacy" label="privacy" />
<category scheme="https://albertdelafuente.com/tags/privacy" term="privacy" label="privacy" />
<category scheme="https://albertdelafuente.com/tags/security" term="security" label="security" />
</entry>
<entry>
<title type="html"><![CDATA[Analyzing the Biblia Hebraica Stuttgartensia with Text Fabric in Python]]></title>
<link href="https://albertdelafuente.com/posts/bhs/?utm_source=atom_feed" rel="alternate" type="text/html" />
<link href="https://albertdelafuente.com/posts/lhlg/?utm_source=atom_feed" rel="related" type="text/html" title="Learning hebrew like a geek" />
<link href="https://albertdelafuente.com/exobrain/20211206220218-hebrew_root_words_parent_roots_dictionary/?utm_source=atom_feed" rel="related" type="text/html" title="Hebrew root words - parent roots dictionary" />
<link href="https://albertdelafuente.com/posts/20220109211826-configuring_pacsrv_and_powerpill_on_arch_linux/?utm_source=atom_feed" rel="related" type="text/html" title="Configuring pacsrv and powerpill on Arch Linux" />
<link href="https://albertdelafuente.com/exobrain/20220128220728-curso_de_hebreo_de_emc_shalom_colombia/?utm_source=atom_feed" rel="related" type="text/html" title="Curso de hebreo de EMC Shalom Colombia" />
<link href="https://albertdelafuente.com/exobrain/20220129174154-how_to_configure_the_keyboard_layout_for_hebrew/?utm_source=atom_feed" rel="related" type="text/html" title="How to configure the keyboard layout for Hebrew (biblicalSIL, phonetic, etc)" />
<id>https://albertdelafuente.com/posts/bhs/</id>
<author>
<name>Albert De La Fuente</name>
</author>
<published>2023-08-26T23:33:15-03:00</published>
<updated>2023-08-26T23:33:15-03:00</updated>
<content type="html"><![CDATA[
<p>I am very curious about Hebrew, specially the way the language is hierarchically
designed. Put very simply, three letter words are commonly known as trilateral
roots and the rest of the words are variation of these three letters having a
common root word and having related meaning somehow, thus creating a “family” if
you will. I am not going to get into these details since it is not the scope of
this post.</p>
<p>Long story short, the journey of trying to learn Hebrew started probably between
2006 to 2008 and I haven’t had much success at it. Or lets say that the
expectations don’t match reality due to lack of consistent effort on my side.</p>
<p>I thought that I could try to somehow start with the most common words in the
scriptures and that is what motivated <a href="/exobrain/20220601221125-how_to_parse_the_aleppo_codex_and_analyze_its_content_in_python/">to parse the Aleppo codex and analyze its
content in python</a>. I quickly learned that my approach was rather naive given
the prefixes that causes variations to the words, even thought the meaning of
the word itself is the same.</p>
<p>This blog post is a second attempt to tackle the same problem but with a more
sophisticated and accurate approach using Text Fabric.</p>
<p>A corpus of ancient texts and (linguistic) annotations represents a large body
of knowledge.</p>
<p>Text-Fabric is a Python package for processing and access a corpus of ancient
text and linguistic annotations. In this specific case I am using the Hebrew
Bible Database, containing the text of the Hebrew Bible augmented with
linguistic annotations compiled by the Eep Talstra Centre for Bible and
Computer from the VU University Amsterdam.</p>
<p>The text is based on the Biblia Hebraica Stuttgartensia edited by Karl Elliger
and Wilhelm Rudolph, Fifth Revised Edition, edited by Adrian Schenker, © 1977
and 1997 Deutsche Bibelgesellschaft, Stuttgart.</p>
<p>The text-fabric version has been prepared by Dirk Roorda Data Archiving and
Networked Services, with thanks to Martijn Naaijer, Cody Kingham, and
Constantijn Sikkel.</p>
<p>It is amazing to see the work these researchers did compiling all this data and
making it public for free. I am very thankful for it.</p>
<p>I am using a literate programming approach with Doom Emacs and org-mode. This
blog post is a bit more technical oriented. So it is okay if you pass through
the code or if you don’t understand some of it.</p>
<h2 id="create-the-virtual-environment">Create the virtual environment</h2>
<p>This snippet will create a virtual environment and install some libs. Due to
some incompatibilities with the word cloud lib, I had to downgrate do Python
3.6.</p>
<div class="highlight"><pre class="chroma"><code class="language-shell" data-lang="shell"><span class="c1">#virtualenv ~/.workon-home/venv-textfabric</span>
virtualenv --python<span class="o">=</span>/usr/bin/python3.6 ~/.workon-home/venv-textfabric
<span class="nb">cd</span> ~/.workon-home/venv-textfabric
<span class="nb">source</span> ./bin/activate.fish
pip install text-fabric pandas requests</code></pre></div>
<h2 id="activate-the-virtual-environment">Activate the virtual environment</h2>
<p>Use <code>C-c</code> here on Doom Emacs</p>
<div class="highlight"><pre class="chroma"><code class="language-emacs-lisp" data-lang="emacs-lisp"><span class="p">(</span><span class="nv">pyvenv-activate</span> <span class="s">"~/.workon-home/venv-textfabric"</span><span class="p">)</span></code></pre></div>
<h2 id="load-the-etcbc-bhsa-dataset">Load the ETCBC/bhsa dataset</h2>
<p>Use <code>run-python</code> and <code>ober-eval-block-in-repl</code> to evaluate each block (<code>C-c r</code>
at the time being). For the sake of a more practical approach to writing using
literate programming the output of the blocks will follow the code. The
documentation of the BHSA dataset can be found <a href="https://etcbc.github.io/bhsa/">here</a>.</p>
<div class="highlight"><pre class="chroma"><code class="language-python" data-lang="python"><span class="kn">from</span> <span class="nn">tf.app</span> <span class="kn">import</span> <span class="n">use</span>
<span class="kn">import</span> <span class="nn">os</span>
<span class="kn">import</span> <span class="nn">collections</span>
<span class="kn">from</span> <span class="nn">itertools</span> <span class="kn">import</span> <span class="n">chain</span>
<span class="n">A</span> <span class="o">=</span> <span class="n">use</span><span class="p">(</span><span class="s2">"ETCBC/bhsa"</span><span class="p">,</span> <span class="n">hoist</span><span class="o">=</span><span class="nb">globals</span><span class="p">())</span>
<span class="n">A</span><span class="o">.</span><span class="n">indent</span><span class="p">(</span><span class="n">reset</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="n">A</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="s2">"counting objects ..."</span><span class="p">)</span>
<span class="k">for</span> <span class="n">otype</span> <span class="ow">in</span> <span class="n">F</span><span class="o">.</span><span class="n">otype</span><span class="o">.</span><span class="n">all</span><span class="p">:</span>
<span class="n">i</span> <span class="o">=</span> <span class="mi">0</span>
<span class="n">A</span><span class="o">.</span><span class="n">indent</span><span class="p">(</span><span class="n">level</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">reset</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="k">for</span> <span class="n">n</span> <span class="ow">in</span> <span class="n">F</span><span class="o">.</span><span class="n">otype</span><span class="o">.</span><span class="n">s</span><span class="p">(</span><span class="n">otype</span><span class="p">):</span>
<span class="n">i</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="n">A</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="s2">"{:>7} {}s"</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">i</span><span class="p">,</span> <span class="n">otype</span><span class="p">))</span>
<span class="n">A</span><span class="o">.</span><span class="n">indent</span><span class="p">(</span><span class="n">level</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
<span class="n">A</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="s2">"Done"</span><span class="p">)</span></code></pre></div><div class="highlight"><pre class="chroma"><code class="language-text" data-lang="text"><IPython.core.display.HTML object>
<IPython.core.display.HTML object>
<IPython.core.display.HTML object>
<IPython.core.display.HTML object>
This is Text-Fabric 9.5.2
Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html
122 features found and 0 ignored
<IPython.core.display.HTML object>
<IPython.core.display.HTML object>
<IPython.core.display.HTML object>
<IPython.core.display.HTML object>
0.00s counting objects ...
| 0.00s 39 books
| 0.00s 929 chapters
| 0.00s 9230 lexs
| 0.00s 23213 verses
| 0.00s 45179 half_verses
| 0.00s 63717 sentences
| 0.00s 64514 sentence_atoms
| 0.00s 88131 clauses
| 0.00s 90704 clause_atoms
| 0.01s 253203 phrases
| 0.01s 267532 phrase_atoms
| 0.01s 113850 subphrases
| 0.02s 426590 words
0.07s Done</code></pre></div>
<h2 id="analyzing-the-type-of-word-structures-available">Analyzing the type of word structures available</h2>
<p>Text-Fabric allows to query the “types” of words. We can do this by using:</p>
<div class="highlight"><pre class="chroma"><code class="language-python" data-lang="python"><span class="k">print</span><span class="p">(</span><span class="n">F</span><span class="o">.</span><span class="n">sp</span><span class="o">.</span><span class="n">freqList</span><span class="p">())</span></code></pre></div><div class="highlight"><pre class="chroma"><code class="language-text" data-lang="text">(('subs', 125583), ('verb', 75451), ('prep', 73298), ('conj', 62737), ('nmpr', 35607), ('art', 30387), ('adjv', 10141), ('nega', 6059), ('prps', 5035), ('advb', 4603), ('prde', 2678), ('intj', 1912), ('inrg', 1303), ('prin', 1026))</code></pre></div>
<h2 id="defining-some-help-functions">Defining some help functions</h2>
<p>These functions will facilitate the processing of the data. The first one will
get the occurrences of a specific type of lexeme. I am not a linguist so I am
learning on the go and I will try to explain it easily. The most common lexeme
verb is <code>אמר</code> (say). It appears 5307 times. We can see that Yah indeed wants to
communicate and instruct us.</p>
<p>A specific instance of that verb has other properties (node features) like
morphology which contains the verbal stem (qal, piel, nif, hif), the verbal
tense (perf, impf, wayq) and gender (m, f) among other information. A lexeme is
the representation of that word (verb in this case) in the broad aspect, i.e
regardless of the verbal stem, tense, gender, etc.</p>
<div class="highlight"><pre class="chroma"><code class="language-python" data-lang="python"><span class="kn">import</span> <span class="nn">pandas</span> <span class="kn">as</span> <span class="nn">pd</span>
<span class="k">def</span> <span class="nf">get_lexeme_by_type</span><span class="p">(</span><span class="n">lex_type</span><span class="p">):</span>
<span class="n">rows</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">A</span><span class="o">.</span><span class="n">indent</span><span class="p">(</span><span class="n">reset</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="k">for</span> <span class="n">w</span> <span class="ow">in</span> <span class="n">F</span><span class="o">.</span><span class="n">otype</span><span class="o">.</span><span class="n">s</span><span class="p">(</span><span class="s2">"lex"</span><span class="p">):</span>
<span class="k">if</span> <span class="n">F</span><span class="o">.</span><span class="n">sp</span><span class="o">.</span><span class="n">v</span><span class="p">(</span><span class="n">w</span><span class="p">)</span> <span class="o">!=</span> <span class="n">lex_type</span><span class="p">:</span>
<span class="k">continue</span>
<span class="n">row</span> <span class="o">=</span> <span class="p">{</span><span class="s1">'w'</span><span class="p">:</span> <span class="n">w</span><span class="p">,</span>
<span class="s1">'freq_lex'</span><span class="p">:</span> <span class="n">F</span><span class="o">.</span><span class="n">freq_lex</span><span class="o">.</span><span class="n">v</span><span class="p">(</span><span class="n">w</span><span class="p">),</span>
<span class="c1">#'sp': F.sp.v(w),</span>
<span class="s1">'lex_utf8'</span><span class="p">:</span> <span class="n">F</span><span class="o">.</span><span class="n">lex_utf8</span><span class="o">.</span><span class="n">v</span><span class="p">(</span><span class="n">w</span><span class="p">),</span>
<span class="s1">'gloss'</span><span class="p">:</span> <span class="n">F</span><span class="o">.</span><span class="n">gloss</span><span class="o">.</span><span class="n">v</span><span class="p">(</span><span class="n">w</span><span class="p">),</span>
<span class="c1">#'phono': F.phono.v(w),</span>
<span class="c1">#'g_word_utf8': F.g_word_utf8.v(w),</span>
<span class="c1">#'g_lex_utf8': F.g_lex_utf8.v(w),</span>
<span class="c1">#'g_cons_utf8': F.g_cons_utf8.v(w),</span>
<span class="c1">#'gn': F.gn.v(w),</span>
<span class="c1">#'nu': F.nu.v(w),</span>
<span class="c1">#'ps': F.ps.v(w),</span>
<span class="c1">#'st': F.st.v(w),</span>
<span class="c1">#'vs': F.vs.v(w),</span>
<span class="c1">#'vt': F.vt.v(w),</span>
<span class="c1">#'book': F.book.v(w),</span>
<span class="p">}</span>
<span class="n">rows</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">row</span><span class="p">)</span>
<span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">(</span><span class="n">rows</span><span class="p">)</span>
<span class="k">return</span><span class="p">(</span><span class="n">df</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">export_df_to_org_table</span><span class="p">(</span><span class="n">input_df</span><span class="p">,</span> <span class="n">rows_qty</span><span class="p">):</span>
<span class="n">input_df</span><span class="p">[</span><span class="s2">"lex_utf8"</span><span class="p">]</span> <span class="o">=</span> <span class="n">input_df</span><span class="o">.</span><span class="n">apply</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">remove_diacritics</span><span class="p">(</span><span class="n">x</span><span class="p">[</span><span class="s2">"lex_utf8"</span><span class="p">]),</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="n">output_df</span> <span class="o">=</span> <span class="n">input_df</span><span class="o">.</span><span class="n">head</span><span class="p">(</span><span class="n">rows_qty</span><span class="p">)</span>
<span class="k">return</span><span class="p">([</span><span class="nb">list</span><span class="p">(</span><span class="n">output_df</span><span class="p">)]</span> <span class="o">+</span> <span class="p">[</span><span class="bp">None</span><span class="p">]</span> <span class="o">+</span> <span class="n">output_df</span><span class="o">.</span><span class="n">values</span><span class="o">.</span><span class="n">tolist</span><span class="p">())</span></code></pre></div><div class="highlight"><pre class="chroma"><code class="language-text" data-lang="text"></code></pre></div>
<h2 id="most-frequent-prepositions">Most frequent prepositions</h2>
<div class="highlight"><pre class="chroma"><code class="language-python" data-lang="python"><span class="n">df_preps</span> <span class="o">=</span> <span class="n">get_lexeme_by_type</span><span class="p">(</span><span class="s1">'prep'</span><span class="p">)</span>
<span class="n">df_cloud</span> <span class="o">=</span> <span class="n">df_preps</span><span class="o">.</span><span class="n">sort_values</span><span class="p">(</span><span class="n">by</span><span class="o">=</span><span class="p">[</span><span class="s1">'freq_lex'</span><span class="p">],</span> <span class="n">ascending</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
<span class="n">print_df</span> <span class="o">=</span> <span class="n">export_df_to_org_table</span><span class="p">(</span><span class="n">df_cloud</span><span class="p">,</span> <span class="mi">100</span><span class="p">)</span></code></pre></div>
<table>
<thead>
<tr>
<th>w</th>
<th>freq_lex</th>
<th>lex_utf8</th>
<th>gloss</th>
</tr>
</thead>
<tbody>
<tr>
<td>1437629</td>
<td>20069</td>
<td>ל</td>
<td>to</td>
</tr>
<tr>
<td>1437602</td>
<td>15542</td>
<td>ב</td>
<td>in</td>
</tr>
<tr>
<td>1437606</td>
<td>10987</td>
<td>את</td>
<td><object marker></td>
</tr>
<tr>
<td>1437639</td>
<td>7562</td>
<td>מן</td>
<td>from</td>
</tr>
<tr>
<td>1437615</td>
<td>5766</td>
<td>על</td>
<td>upon</td>
</tr>
<tr>
<td>1437644</td>
<td>5517</td>
<td>אל</td>
<td>to</td>
</tr>
<tr>
<td>1437693</td>
<td>2902</td>
<td>כ</td>
<td>as</td>
</tr>
<tr>
<td>1437843</td>
<td>1263</td>
<td>עד</td>
<td>unto</td>
</tr>
<tr>
<td>1437805</td>
<td>1049</td>
<td>עם</td>
<td>with</td>
</tr>
<tr>
<td>1437865</td>
<td>878</td>
<td>את</td>
<td>together with</td>
</tr>
<tr>
<td>1443534</td>
<td>378</td>
<td>ל</td>
<td>to</td>
</tr>
<tr>
<td>1443536</td>
<td>345</td>
<td>די</td>
<td><relative></td>
</tr>
<tr>
<td>1438236</td>
<td>272</td>
<td>למען</td>
<td>because of</td>
</tr>
<tr>
<td>1445292</td>
<td>226</td>
<td>ב</td>
<td>in</td>
</tr>
<tr>
<td>1438491</td>
<td>142</td>
<td>כמו</td>
<td>like</td>
</tr>
<tr>
<td>1443543</td>
<td>119</td>
<td>מן</td>
<td>from</td>
</tr>
<tr>
<td>1445269</td>
<td>104</td>
<td>על</td>
<td>upon</td>
</tr>
<tr>
<td>1443531</td>
<td>63</td>
<td>כ</td>
<td>like</td>
</tr>
<tr>
<td>1445265</td>
<td>35</td>
<td>עד</td>
<td>until</td>
</tr>
<tr>
<td>1445281</td>
<td>22</td>
<td>עם</td>
<td>with</td>
</tr>
<tr>
<td>1438337</td>
<td>17</td>
<td>בלעדי</td>
<td>without</td>
</tr>
<tr>
<td>1443082</td>
<td>9</td>
<td>במו</td>
<td>in</td>
</tr>
<tr>
<td>1444752</td>
<td>4</td>
<td>למו</td>
<td>to</td>
</tr>
<tr>
<td>1445487</td>
<td>1</td>
<td>ית</td>
<td><nota accusativi></td>
</tr>
<tr>
<td>1445919</td>
<td>1</td>
<td>לות</td>
<td>with</td>
</tr>
</tbody>
</table>
<p>Fair enough, this is a good starting point to learn some words.</p>
<h2 id="most-frequent-names--people-and-places">Most frequent names (people and places)</h2>
<p>Now lets see the 30 most cited names in scriptures</p>
<div class="highlight"><pre class="chroma"><code class="language-python" data-lang="python"><span class="n">df_nmpr</span> <span class="o">=</span> <span class="n">get_lexeme_by_type</span><span class="p">(</span><span class="s1">'nmpr'</span><span class="p">)</span>
<span class="n">df_cloud</span> <span class="o">=</span> <span class="n">df_nmpr</span><span class="o">.</span><span class="n">sort_values</span><span class="p">(</span><span class="n">by</span><span class="o">=</span><span class="p">[</span><span class="s1">'freq_lex'</span><span class="p">],</span> <span class="n">ascending</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
<span class="n">print_df</span> <span class="o">=</span> <span class="n">export_df_to_org_table</span><span class="p">(</span><span class="n">df_cloud</span><span class="p">,</span> <span class="mi">30</span><span class="p">)</span></code></pre></div>
<table>
<thead>
<tr>
<th>w</th>
<th>freq_lex</th>
<th>lex_utf8</th>
<th>gloss</th>
</tr>
</thead>
<tbody>
<tr>
<td>1437714</td>
<td>6828</td>
<td>יהוה</td>
<td>YHWH</td>
</tr>
<tr>
<td>1438941</td>
<td>2506</td>
<td>ישראל</td>
<td>Israel</td>
</tr>
<tr>
<td>1441856</td>
<td>1075</td>
<td>דוד</td>
<td>David</td>
</tr>
<tr>
<td>1438822</td>
<td>819</td>
<td>יהודה</td>
<td>Judah</td>
</tr>
<tr>
<td>1439439</td>
<td>766</td>
<td>משה</td>
<td>Moses</td>
</tr>
<tr>
<td>1438103</td>
<td>681</td>
<td>מצרים</td>
<td>Egypt</td>
</tr>
<tr>
<td>1441150</td>
<td>643</td>
<td>ירושלם</td>
<td>Jerusalem</td>
</tr>
<tr>
<td>1438343</td>
<td>438</td>
<td>אדני</td>
<td>Lord</td>
</tr>
<tr>
<td>1439060</td>
<td>406</td>
<td>שאול</td>
<td>Saul</td>
</tr>
<tr>
<td>1438702</td>
<td>349</td>
<td>יעקב</td>
<td>Jacob</td>
</tr>
<tr>
<td>1439473</td>
<td>347</td>
<td>אהרן</td>
<td>Aaron</td>
</tr>
<tr>
<td>1442014</td>
<td>293</td>
<td>שלמה</td>
<td>Solomon</td>
</tr>
<tr>
<td>1438115</td>
<td>262</td>
<td>בבל</td>
<td>Babel</td>
</tr>
<tr>
<td>1439714</td>
<td>218</td>
<td>יהושע</td>
<td>Joshua</td>
</tr>
<tr>