Skip to content

Commit

Permalink
Don't use uplow rule with two dot patterns
Browse files Browse the repository at this point in the history
- If dot patterns are the same.
- When cap sign is defined, use an uplow rule with only the lowercase
  dot pattern and define a comp6 rule for the uppercase pattern.
- When cap sign is not defined, replace the uplow rule with a
  lowercase rule and an uppercase rule.
  • Loading branch information
bertfrees committed Dec 6, 2021
1 parent bb0ef81 commit 68d4f72
Show file tree
Hide file tree
Showing 42 changed files with 1,862 additions and 899 deletions.
14 changes: 4 additions & 10 deletions NEWS
Original file line number Diff line number Diff line change
Expand Up @@ -58,22 +58,16 @@ issues]].
** New, renamed or removed tables
*** New
- ru-brf.dis

*** Renamed
None

*** Removed
- ru-ru.dis

** New, renamed or removed tables
*** New
- ja-kantenji.utb
- latinUppercaseComp6.uti

*** Renamed
None

*** Removed
None
- ru-ru.dis
- cs-letterDef8Dots.uti
- ru-chardefs.cti

* Noteworthy changes in release 3.19.0 (2021-09-06)
For this release Bert Frees has been hard at work to clean up the code
Expand Down
90 changes: 59 additions & 31 deletions tables/Es-Es-G0.utb
Original file line number Diff line number Diff line change
Expand Up @@ -46,40 +46,68 @@ space \x00A0 0 # Espacio de no separación
# all except 0 are the same, so define 0 here to take higher presidence
# Also define ó (lowercase o acute) not to clash with the definition of 0 in original include.
digit 0 34678 cero
uplow \x00d3\x00f3 3467,346 o con acento
include digits6DotsPlusDot6.uti
uppercase \x00d3 3467 o con acento
lowercase \x00f3 346 o con acento

include digits6DotsPlusDot6.uti
include latinLetterDef8Dots.uti

uplow \x00c7\x00e7 1234678,123468 c cedilla
uplow \x00c1\x00e1 123567,12356 a con acento
uplow \x00c9\x00e9 23467,2346 e con acento
uplow \x00cd\x00ed 347,34 i con acento
uplow \x00da\x00fa 234567,23456 u con acento
uplow \x00c0\x00e0 1235678,123568 a grave
uplow \x00c8\x00e8 234678,23468 e grave
uplow \x00cc\x00ec 345,348 i grave
uplow \x00d2\x00f2 2458,3468 o grave
uplow \x00d9\x00f9 2345678,234568 u grave
uplow \x00c2\x00e2 178,18 a con circunflejo
uplow \x00ca\x00ea 1578,158 e con circunflejo
uplow \x00ce\x00ee 2478,248 i con circunflejo
uplow \x00d4\x00f4 13578,1358 o con circunflejo
uplow \x00db\x00fb 13678,1368 u con circunflejo
uplow \x00c4\x00e4 34578,3458 a con diéresis
uplow \x00cb\x00eb 124678,12468 e con diéresis
uplow \x00cf\x00ef 1245678,258 i con diéresis
uplow \x00d6\x00f6 24678,2468 o con diéresis
uplow \x00dc\x00fc 125678,12568 u con diéresis
uplow \x00dd\x00fd 1567,2348 ye con acento agudo
uplow \x009F\x00FF 367,67 ye con diéresis

uplow \x00c6\x00e6 38,1348 ae
uplow \x0152\x0153 1468,1238 oe
uplow \x008C\x009C 1468,1238 oe
uplow \x00C3\x00E3 3567,168 a con tilde
uplow \x00D5\x00F5 12458,4567 o con tilde
uplow \x00D1\x00F1 124567,124568 letra eñe
lowercase \x00e7 123468 c cedilla
lowercase \x00e1 12356 a con acento
lowercase \x00e9 2346 e con acento
lowercase \x00ed 34 i con acento
lowercase \x00fa 23456 u con acento
lowercase \x00e0 123568 a grave
lowercase \x00e8 23468 e grave
lowercase \x00ec 348 i grave
lowercase \x00f2 3468 o grave
lowercase \x00f9 234568 u grave
lowercase \x00e2 18 a con circunflejo
lowercase \x00ea 158 e con circunflejo
lowercase \x00ee 248 i con circunflejo
lowercase \x00f4 1358 o con circunflejo
lowercase \x00fb 1368 u con circunflejo
lowercase \x00e4 3458 a con diéresis
lowercase \x00eb 12468 e con diéresis
lowercase \x00ef 258 i con diéresis
lowercase \x00f6 2468 o con diéresis
lowercase \x00fc 12568 u con diéresis
lowercase \x00fd 2348 ye con acento agudo
lowercase \x00FF 67 ye con diéresis
lowercase \x00e6 1348 ae
lowercase \x0153 1238 oe
lowercase \x009C 1238 oe
lowercase \x00E3 168 a con tilde
lowercase \x00F5 4567 o con tilde
lowercase \x00F1 124568 letra eñe
uppercase \x00c7 1234678 c cedilla
uppercase \x00c1 123567 a con acento
uppercase \x00c9 23467 e con acento
uppercase \x00cd 347 i con acento
uppercase \x00da 234567 u con acento
uppercase \x00c0 1235678 a grave
uppercase \x00c8 234678 e grave
uppercase \x00cc 345 i grave
uppercase \x00d2 2458 o grave
uppercase \x00d9 2345678 u grave
uppercase \x00c2 178 a con circunflejo
uppercase \x00ca 1578 e con circunflejo
uppercase \x00ce 2478 i con circunflejo
uppercase \x00d4 13578 o con circunflejo
uppercase \x00db 13678 u con circunflejo
uppercase \x00c4 34578 a con diéresis
uppercase \x00cb 124678 e con diéresis
uppercase \x00cf 1245678 i con diéresis
uppercase \x00d6 24678 o con diéresis
uppercase \x00dc 125678 u con diéresis
uppercase \x00dd 1567 ye con acento agudo
uppercase \x009F 367 ye con diéresis
uppercase \x00c6 38 ae
uppercase \x0152 1468 oe
uppercase \x008C 1468 oe
uppercase \x00C3 3567 a con tilde
uppercase \x00D5 12458 o con tilde
uppercase \x00D1 124567 letra eñe

punctuation , 2 coma
punctuation ; 23 punto y coma
Expand Down
3 changes: 1 addition & 2 deletions tables/Makefile.am
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,6 @@ table_files = \
cs-chardefs.cti \
cs-comp8.utb \
cs-g1.ctb \
cs-letterDef8Dots.uti \
cs.tbl \
cs-translation.cti \
cy-cy-g1.utb \
Expand Down Expand Up @@ -234,6 +233,7 @@ table_files = \
ks-in-g1.utb \
latinLetterDef6Dots.uti \
latinLetterDef8Dots.uti \
latinUppercaseComp6.uti \
litdigits6DotsPlusDot6.uti \
litdigits6Dots.uti \
loweredDigits6Dots.uti \
Expand Down Expand Up @@ -312,7 +312,6 @@ table_files = \
ro.ctb \
ro.tbl \
ru-brf.dis \
ru-chardefs.cti \
ru-compbrl.ctb \
ru.ctb \
ru-letters.dis \
Expand Down
78 changes: 52 additions & 26 deletions tables/ar-ar-comp8.utb
Original file line number Diff line number Diff line change
Expand Up @@ -163,32 +163,58 @@ digit \x0668 1258 # 8 (٨)
digit \x0669 248 # 9 (٩)

# English letters backward translation only
nofor uplow Aa 17,178
nofor uplow Bb 127,1278
nofor uplow Cc 147,1478
nofor uplow Dd 1457,14578
nofor uplow Ee 157,1578
nofor uplow Ff 1247,12478
nofor uplow Gg 12457,124578
nofor uplow Hh 1257,12578
nofor uplow Ii 247,2478
nofor uplow Jj 2457,24578
nofor uplow Kk 137,1378
nofor uplow Ll 1237,12378
nofor uplow Mm 1347,13478
nofor uplow Nn 13457,134578
nofor uplow Oo 1357,13578
nofor uplow Pp 12347,123478
nofor uplow Qq 123457,1234578
nofor uplow Rr 12357,123578
nofor uplow Ss 2347,23478
nofor uplow Tt 23457,234578
nofor uplow Uu 1367,13678
nofor uplow Vv 12367,123678
nofor uplow Ww 24567,245678
nofor uplow Xx 13467,134678
nofor uplow Yy 134567,1345678
nofor uplow Zz 13567,135678
nofor lowercase a 178
nofor lowercase b 1278
nofor lowercase c 1478
nofor lowercase d 14578
nofor lowercase e 1578
nofor lowercase f 12478
nofor lowercase g 124578
nofor lowercase h 12578
nofor lowercase i 2478
nofor lowercase j 24578
nofor lowercase k 1378
nofor lowercase l 12378
nofor lowercase m 13478
nofor lowercase n 134578
nofor lowercase o 13578
nofor lowercase p 123478
nofor lowercase q 1234578
nofor lowercase r 123578
nofor lowercase s 23478
nofor lowercase t 234578
nofor lowercase u 13678
nofor lowercase v 123678
nofor lowercase w 245678
nofor lowercase x 134678
nofor lowercase y 1345678
nofor lowercase z 135678
nofor uppercase A 17
nofor uppercase B 127
nofor uppercase C 147
nofor uppercase D 1457
nofor uppercase E 157
nofor uppercase F 1247
nofor uppercase G 12457
nofor uppercase H 1257
nofor uppercase I 247
nofor uppercase J 2457
nofor uppercase K 137
nofor uppercase L 1237
nofor uppercase M 1347
nofor uppercase N 13457
nofor uppercase O 1357
nofor uppercase P 12347
nofor uppercase Q 123457
nofor uppercase R 12357
nofor uppercase S 2347
nofor uppercase T 23457
nofor uppercase U 1367
nofor uppercase V 12367
nofor uppercase W 24567
nofor uppercase X 13467
nofor uppercase Y 134567
nofor uppercase Z 13567

#punctuation symbols
punctuation ، 57 # Arabic comma (\x060C)
Expand Down
27 changes: 18 additions & 9 deletions tables/ba.utb
Original file line number Diff line number Diff line change
Expand Up @@ -37,15 +37,24 @@ include ru-litbrl.ctb
# alphabet, namely Ә, Ө, Ҡ, Ғ, Ҫ, Ҙ, Һ, Ү and Ң. Like in ru-chardefs.cti, the
# following definitions have dot 9 set to make them distinguishable
# from the Latin letters.
uplow \x04D8\x04D9 34579,3459 CYRILLIC LETTER æ Әә
uplow \x04E8\x04E9 12679,1269 CYRILLIC LETTER ø Өө
uplow \x04A0\x04A1 14679,1469 CYRILLIC LETTER q Ҡҡ
uplow \x0492\x0493 1245679,124569 CYRILLIC LETTER ɣ Ғғ
uplow \x04AA\x04AB 3479,349 CYRILLIC LETTER θ Ҫҫ
uplow \x0498\x0499 34679,3469 CYRILLIC LETTER ð Ҙҙ
uplow \x04BA\x04BB 123679,12369 CYRILLIC LETTER h Һһ
uplow \x04AE\x04AF 1345679,134569 CYRILLIC LETTER y Үү
uplow \x04A2\x04A3 145679,14569 CYRILLIC LETTER ŋ Ңң
uppercase \x04D8 34579 CYRILLIC LETTER æ Ә
uppercase \x04E8 12679 CYRILLIC LETTER ø Ө
uppercase \x04A0 14679 CYRILLIC LETTER q Ҡ
uppercase \x0492 1245679 CYRILLIC LETTER ɣ Ғ
uppercase \x04AA 3479 CYRILLIC LETTER θ Ҫ
uppercase \x0498 34679 CYRILLIC LETTER ð Ҙ
uppercase \x04BA 123679 CYRILLIC LETTER h Һ
uppercase \x04AE 1345679 CYRILLIC LETTER y Ү
uppercase \x04A2 145679 CYRILLIC LETTER ŋ Ң
lowercase \x04D9 3459 CYRILLIC LETTER æ ә
lowercase \x04E9 1269 CYRILLIC LETTER ø ө
lowercase \x04A1 1469 CYRILLIC LETTER q ҡ
lowercase \x0493 124569 CYRILLIC LETTER ɣ ғ
lowercase \x04AB 349 CYRILLIC LETTER θ ҫ
lowercase \x0499 3469 CYRILLIC LETTER ð ҙ
lowercase \x04BB 12369 CYRILLIC LETTER h һ
lowercase \x04AF 134569 CYRILLIC LETTER y ү
lowercase \x04A3 14569 CYRILLIC LETTER ŋ ң

# Extend classes defined in ru-litbrl.ctb
attribute uppercyrillic \x04D8\x04E8\x04A0\x0492\x04AA\x0498\x04BA\x04AE\x04A2
Expand Down
78 changes: 52 additions & 26 deletions tables/bg.ctb
Original file line number Diff line number Diff line change
Expand Up @@ -158,32 +158,58 @@ uppercase \x046a 2467 CYRILLIC CAPITAL LETTER BIG YUS
lowercase \x046b 246 CYRILLIC SMALL LETTER BIG YUS

# Latin letters
uplow Aa 178,18
uplow Bb 1278,128
uplow Cc 1478,148
uplow Dd 14578,1458
uplow Ee 1578,158
uplow Ff 12478,1248
uplow Gg 124578,12458
uplow Hh 12578,1258
uplow Ii 2478,248
uplow Jj 24578,2458
uplow Kk 1378,138
uplow Ll 12378,1238
uplow Mm 13478,1348
uplow Nn 134578,13458
uplow Oo 13578,1358
uplow Pp 123478,12348
uplow Qq 1234578,123458
uplow Rr 123578,12358
uplow Ss 23478,2348
uplow Tt 234578,23458
uplow Uu 13678,1368
uplow Vv 123678,12368
uplow Ww 245678,24568
uplow Xx 134678,13468
uplow Yy 1345678,134568
uplow Zz 135678,13568
uppercase A 178
uppercase B 1278
uppercase C 1478
uppercase D 14578
uppercase E 1578
uppercase F 12478
uppercase G 124578
uppercase H 12578
uppercase I 2478
uppercase J 24578
uppercase K 1378
uppercase L 12378
uppercase M 13478
uppercase N 134578
uppercase O 13578
uppercase P 123478
uppercase Q 1234578
uppercase R 123578
uppercase S 23478
uppercase T 234578
uppercase U 13678
uppercase V 123678
uppercase W 245678
uppercase X 134678
uppercase Y 1345678
uppercase Z 135678
lowercase a 18
lowercase b 128
lowercase c 148
lowercase d 1458
lowercase e 158
lowercase f 1248
lowercase g 12458
lowercase h 1258
lowercase i 248
lowercase j 2458
lowercase k 138
lowercase l 1238
lowercase m 1348
lowercase n 13458
lowercase o 1358
lowercase p 12348
lowercase q 123458
lowercase r 12358
lowercase s 2348
lowercase t 23458
lowercase u 1368
lowercase v 12368
lowercase w 24568
lowercase x 13468
lowercase y 134568
lowercase z 13568

# Miscellaneous
noback sign \x25CF 35 BLACK CIRCLE
22 changes: 11 additions & 11 deletions tables/ca-chardefs.cti
Original file line number Diff line number Diff line change
Expand Up @@ -42,16 +42,16 @@ include spaces.uti

include latinLetterDef6Dots.uti

uplow \x00C0\x00E0 12356,12356 Àà LATIN CAPITAL LETTER A WITH GRAVE - LATIN SMALL LETTER A WITH GRAVE
uplow \x00C7\x00E7 12346,12346 Çç LATIN CAPITAL LETTER C WITH CEDILLA - LATIN SMALL LETTER C WITH CEDILLA
uplow \x00C8\x00E8 2346,2346 Èè LATIN CAPITAL LETTER E WITH GRAVE - LATIN SMALL LETTER E WITH GRAVE
uplow \x00C9\x00E9 123456,123456 Éé LATIN CAPITAL LETTER E WITH ACUTE - LATIN SMALL LETTER E WITH ACUTE
uplow \x00CD\x00ED 34,34 Íí LATIN CAPITAL LETTER I WITH ACUTE - LATIN SMALL LETTER I WITH ACUTE
uplow \x00CF\x00EF 12456,12456 Ïï LATIN CAPITAL LETTER I WITH DIAERESIS - LATIN SMALL LETTER I WITH DIAERESIS
uplow \x00D2\x00F2 346,346 Òò LATIN CAPITAL LETTER O WITH GRAVE - LATIN SMALL LETTER O WITH GRAVE
uplow \x00D3\x00F3 246,246 Óó LATIN CAPITAL LETTER O WITH ACUTE - LATIN SMALL LETTER O WITH ACUTE
uplow \x00DA\x00FA 23456,23456 Úú LATIN CAPITAL LETTER U WITH ACUTE - LATIN SMALL LETTER U WITH ACUTE
uplow \x00DC\x00FC 1256,1256 Üü LATIN CAPITAL LETTER U WITH DIAERESIS - LATIN SMALL LETTER U WITH DIAERESIS
uplow \x00C0\x00E0 12356 Àà LATIN CAPITAL LETTER A WITH GRAVE - LATIN SMALL LETTER A WITH GRAVE
uplow \x00C7\x00E7 12346 Çç LATIN CAPITAL LETTER C WITH CEDILLA - LATIN SMALL LETTER C WITH CEDILLA
uplow \x00C8\x00E8 2346 Èè LATIN CAPITAL LETTER E WITH GRAVE - LATIN SMALL LETTER E WITH GRAVE
uplow \x00C9\x00E9 123456 Éé LATIN CAPITAL LETTER E WITH ACUTE - LATIN SMALL LETTER E WITH ACUTE
uplow \x00CD\x00ED 34 Íí LATIN CAPITAL LETTER I WITH ACUTE - LATIN SMALL LETTER I WITH ACUTE
uplow \x00CF\x00EF 12456 Ïï LATIN CAPITAL LETTER I WITH DIAERESIS - LATIN SMALL LETTER I WITH DIAERESIS
uplow \x00D2\x00F2 346 Òò LATIN CAPITAL LETTER O WITH GRAVE - LATIN SMALL LETTER O WITH GRAVE
uplow \x00D3\x00F3 246 Óó LATIN CAPITAL LETTER O WITH ACUTE - LATIN SMALL LETTER O WITH ACUTE
uplow \x00DA\x00FA 23456 Úú LATIN CAPITAL LETTER U WITH ACUTE - LATIN SMALL LETTER U WITH ACUTE
uplow \x00DC\x00FC 1256 Üü LATIN CAPITAL LETTER U WITH DIAERESIS - LATIN SMALL LETTER U WITH DIAERESIS

punctuation \x0021 256 ! EXCLAMATION MARK
punctuation \x0022 236 " QUOTATION MARK
Expand Down Expand Up @@ -160,7 +160,7 @@ math \x00BE 1456-25-145 ¾ VULGAR FRACTI
# Unicode 0100..017F Latin Extended-A
# ----------------------------------------------------------------------------------------------

uplow \x013F\x0140 123-5,123-5 Ŀŀ LATIN CAPITAL LETTER L WITH MIDDLE DOT - LATIN SMALL LETTER L WITH MIDDLE DOT
uplow \x013F\x0140 123-5 Ŀŀ LATIN CAPITAL LETTER L WITH MIDDLE DOT - LATIN SMALL LETTER L WITH MIDDLE DOT


# ----------------------------------------------------------------------------------------------
Expand Down
Loading

0 comments on commit 68d4f72

Please sign in to comment.