Skip to content

Commit 97fb29e

Browse files
committed
Improve theh support of non-ASCII characters
1 parent c70da3e commit 97fb29e

File tree

6 files changed

+257
-90
lines changed

6 files changed

+257
-90
lines changed

doc/index.md

+1
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,7 @@ external_resources.md
4242
:caption: Reference documentation
4343
4444
api/index.rst
45+
techref/index.md
4546
changes.md
4647
minversions.md
4748
```

doc/techref/encodings.md

+108
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,108 @@
1+
# Supported Encodings and Non-ASCII Characters
2+
3+
GMT supports a number of encodings and each encoding contains a set of ASCII and non-ASCII
4+
characters. Below are a few of the most common encodings and the characters they support.
5+
6+
In PyGMT, you can use any of these ASCII and non-ASCII characters in arguments and text
7+
strings. When use non-ASCII characters in PyGMT, the easiest way is to copy and paste
8+
the character from the tables below.
9+
10+
**Note**: The special character � (REPLACEMENT CHARACTER) is used to indicate that
11+
the character is not defined in the encoding.
12+
13+
## Adobe ISOLatin1+ Encoding
14+
15+
| octal | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
16+
|---|---|---|---|---|---|---|---|---|
17+
| **\03x** | � | • | … | ™ | — | – | fi | ž |
18+
| **\04x** |   | ! | " | # | $ | % | & | ’ |
19+
| **\05x** | ( | ) | * | + | , | - | . | / |
20+
| **\06x** | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
21+
| **\07x** | 8 | 9 | : | &#x003b; | < | = | > | ? |
22+
| **\10x** | @ | A | B | C | D | E | F | G |
23+
| **\11x** | H | I | J | K | L | M | N | O |
24+
| **\12x** | P | Q | R | S | T | U | V | W |
25+
| **\13x** | X | Y | Z | [ | \ | ] | ^ | _ |
26+
| **\14x** | ‘ | a | b | c | d | e | f | g |
27+
| **\15x** | h | i | j | k | l | m | n | o |
28+
| **\16x** | p | q | r | s | t | u | v | w |
29+
| **\17x** | x | y | z | { | | | } | ~ | š |
30+
| **\20x** | Œ | † | ‡ | Ł | ⁄ | ‹ | Š | › |
31+
| **\21x** | œ | Ÿ | Ž | ł | ‰ | „ | “ | ” |
32+
| **\22x** | ı | ` | ´ | ^ | ˜ | ¯ | ˘ | ˙ |
33+
| **\23x** | ¨ | ‚ | ˚ | ¸ | ' | ˝ | ˛ | ˇ |
34+
| **\24x** | � | ¡ | ¢ | £ | ¤ | ¥ | ¦ | § |
35+
| **\25x** | ¨ | © | ª | « | ¬ | ­ | ® | ¯ |
36+
| **\26x** | ° | ± | ² | ³ | ´ | µ | ¶ | · |
37+
| **\27x** | ¸ | ¹ | º | » | ¼ | ½ | ¾ | ¿ |
38+
| **\30x** | À | Á | Â | Ã | Ä | Å | Æ | Ç |
39+
| **\31x** | È | É | Ê | Ë | Ì | Í | Î | Ï |
40+
| **\32x** | Ð | Ñ | Ò | Ó | Ô | Õ | Ö | × |
41+
| **\33x** | Ø | Ù | Ú | Û | Ü | Ý | Þ | ß |
42+
| **\34x** | à | á | â | ã | ä | å | æ | ç |
43+
| **\35x** | è | é | ê | ë | ì | í | î | ï |
44+
| **\36x** | ð | ñ | ò | ó | ô | õ | ö | ÷ |
45+
| **\37x** | ø | ù | ú | û | ü | ý | þ | ÿ |
46+
47+
## Adobe Symbol Encoding
48+
49+
| octal | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
50+
|---|---|---|---|---|---|---|---|---|
51+
| **\04x** |   | ! | ∀ | # | ∃ | % | & | ∋ |
52+
| **\05x** | ( | ) | ∗ | + | , | − | . | / |
53+
| **\06x** | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
54+
| **\07x** | 8 | 9 | : | &#x003b; | < | = | > | ? |
55+
| **\10x** | ≅ | Α | Β | Χ | ∆ | Ε | Φ | Γ |
56+
| **\11x** | Η | Ι | ϑ | Κ | Λ | Μ | Ν | Ο |
57+
| **\12x** | Π | Θ | Ρ | Σ | Τ | Υ | ς | Ω |
58+
| **\13x** | Ξ | Ψ | Ζ | [ | ∴ | ] | ⊥ | _ |
59+
| **\14x** |  | α | β | χ | δ | ε | φ | γ |
60+
| **\15x** | η | ι | ϕ | κ | λ | μ | ν | ο |
61+
| **\16x** | π | θ | ρ | σ | τ | υ | ϖ | ω |
62+
| **\17x** | ξ | ψ | ζ | { | | | } | ∼ | � |
63+
| **\24x** | € | ϒ | ′ | ≤ | ∕ | ∞ | ƒ | ♣ |
64+
| **\25x** | ♦ | ♥ | ♠ | ↔ | ← | ↑ | → | ↓ |
65+
| **\26x** | ° | ± | ″ | ≥ | × | ∝ | ∂ | • |
66+
| **\27x** | ÷ | ≠ | ≡ | ≈ | … | ⏐ | ⎯ | ↵ |
67+
| **\30x** | ℵ | ℑ | ℜ | ℘ | ⊗ | ⊕ | ∅ | ∩ |
68+
| **\31x** | ∪ | ⊃ | ⊇ | ⊄ | ⊂ | ⊆ | ∈ | ∉ |
69+
| **\32x** | ∠ | ∇ | ® | © | ™ | ∏ | √ | ⋅ |
70+
| **\33x** | ¬ | ∧ | ∨ | ⇔ | ⇐ | ⇑ | ⇒ | ⇓ |
71+
| **\34x** | ◊ | 〈 | ® | © | ™ | ∑ | ⎛ | ⎜ |
72+
| **\35x** | ⎝ | ⎡ | ⎢ | ⎣ | ⎧ | ⎨ | ⎩ | ⎪ |
73+
| **\36x** | � | 〉 | ∫ | ⌠ | ⎮ | ⌡ | ⎞ | ⎟ |
74+
| **\37x** | ⎠ | ⎤ | ⎥ | ⎦ | ⎫ | ⎬ | ⎭ | � |
75+
76+
**Note**: The octal code `\140` represent the RADICAL EXTENDER character, which is not available in
77+
the Unicode character set.
78+
79+
## Adobe ZapfDingbats Encoding
80+
81+
| octal | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
82+
|---|---|---|---|---|---|---|---|---|
83+
| **\04x** |   | ✁ | ✂ | ✃ | ✄ | ☎ | ✆ | ✇ |
84+
| **\05x** | ✈ | ✉ | ☛ | ☞ | ✌ | ✍ | ✎ | ✏ |
85+
| **\06x** | ✐ | ✑ | ✒ | ✓ | ✔ | ✕ | ✖ | ✗ |
86+
| **\07x** | ✘ | ✙ | ✚ | ✛ | ✜ | ✝ | ✞ | ✟ |
87+
| **\10x** | ✠ | ✡ | ✢ | ✣ | ✤ | ✥ | ✦ | ✧ |
88+
| **\11x** | ★ | ✩ | ✪ | ✫ | ✬ | ✭ | ✮ | ✯ |
89+
| **\12x** | ✰ | ✱ | ✲ | ✳ | ✴ | ✵ | ✶ | ✷ |
90+
| **\13x** | ✸ | ✹ | ✺ | ✻ | ✼ | ✽ | ✾ | ✿ |
91+
| **\14x** | ❀ | ❁ | ❂ | ❃ | ❄ | ❅ | ❆ | ❇ |
92+
| **\15x** | ❈ | ❉ | ❊ | ❋ | ● | ❍ | ■ | ❏ |
93+
| **\16x** | ❐ | ❑ | ❒ | ▲ | ▼ | ◆ | ❖ | ◗ |
94+
| **\17x** | ❘ | ❙ | ❚ | ❛ | ❜ | ❝ | ❞ | � |
95+
| **\20x** | ❨ | ❩ | ❪ | ❫ | ❬ | ❭ | ❮ | ❯ |
96+
| **\21x** | ❰ | ❱ | ❲ | ❳ | ❴ | ❵ | � | � |
97+
| **\24x** | � | ❡ | ❢ | ❣ | ❤ | ❥ | ❦ | ❧ |
98+
| **\25x** | ♣ | ♦ | ♥ | ♠ | ① | ② | ③ | ④ |
99+
| **\26x** | ⑤ | ⑥ | ⑦ | ⑧ | ⑨ | ⑩ | ❶ | ❷ |
100+
| **\27x** | ❸ | ❹ | ❺ | ❻ | ❼ | ❽ | ❾ | ❿ |
101+
| **\30x** | ➀ | ➁ | ➂ | ➃ | ➄ | ➅ | ➆ | ➇ |
102+
| **\31x** | ➈ | ➉ | ➊ | ➋ | ➌ | ➍ | ➎ | ➏ |
103+
| **\32x** | ➐ | ➑ | ➒ | ➓ | ➔ | → | ↔ | ↕ |
104+
| **\33x** | ➘ | ➙ | ➚ | ➛ | ➜ | ➝ | ➞ | ➟ |
105+
| **\34x** | ➠ | ➡ | ➢ | ➣ | ➤ | ➥ | ➦ | ➧ |
106+
| **\35x** | ➨ | ➩ | ➪ | ➫ | ➬ | ➭ | ➮ | ➯ |
107+
| **\36x** | � | ➱ | ➲ | ➳ | ➴ | ➵ | ➶ | ➷ |
108+
| **\37x** | ➸ | ➹ | ➺ | ➻ | ➼ | ➽ | ➾ | � |

doc/techref/index.md

+7
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
# Technical Reference
2+
3+
```{toctree}
4+
:maxdepth: 1
5+
6+
encodings.md
7+
```

pygmt/encodings.py

+124
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,124 @@
1+
"""
2+
Adobe character encodings supported by GMT.
3+
4+
Currently, only Adobe Symbol, Adobe ZapfDingbats, and Adobe ISOLatin1+ encodings are
5+
supported.
6+
7+
The corresponding Unicode characters in each Adobe chararacter encoding are generated
8+
from the mapping table and conversion script in the GMT-octal-codes
9+
(https://github.com/seisman/GMT-octal-codes) repository. Refer to that repository for
10+
details.
11+
12+
Some code points are undefined and are assigned with the replacement characeter
13+
(``\ufffd``).
14+
15+
References
16+
----------
17+
18+
- GMT-octal-codes: https://github.com/seisman/GMT-octal-codes
19+
- GMT official documentation: https://docs.generic-mapping-tools.org/dev/reference/octal-codes.html
20+
- Adobe Postscript Language Reference: https://www.adobe.com/jp/print/postscript/pdfs/PLRM.pdf
21+
- Adobe Symbol: https://en.wikipedia.org/wiki/Symbol_(typeface)
22+
- Zapf Dingbats: https://en.wikipedia.org/wiki/Zapf_Dingbats
23+
- ISO-8859-1: https://en.wikipedia.org/wiki/ISO/IEC_8859-1
24+
- ISOLatin1+: https://en.wikipedia.org/wiki/PostScript_Latin_1_Encoding
25+
- Adobe Glyph List: https://github.com/adobe-type-tools/agl-aglfn
26+
"""
27+
28+
# Dictionary of character mappings for different encodings.
29+
charset: dict = {}
30+
31+
# Adobe ISOLatin1+ charset.
32+
# Most characters are the same in ISOLatin1+ and ISO-8859-1 encodings.
33+
charset["ISOLatin1+"] = {
34+
i: chr(i) for i in [*range(0o040, 0o177), *range(0o240, 0o400)]
35+
}
36+
# Handle characters that are different in ISOLatin1+ and ISO-8859-1 encodings.
37+
charset["ISOLatin1+"].update(
38+
{
39+
0o047: "\u2019", # Change "Apostrophe" to "Right Single Quotation Mark"
40+
0o055: "\u2212", # Change "Hyphen-minus" to "Minus Sign"
41+
0o140: "\u2018", # Change "Grave Accent" to "Left Single Quotation Mark"
42+
0o177: "\u0161", # Set to "Latin Small Letter S with Caron"
43+
}
44+
)
45+
# Add extended characters in ISOLatin1+.
46+
charset["ISOLatin1+"].update(
47+
dict(
48+
zip(
49+
[*range(0o030, 0o040), *range(0o200, 0o240)],
50+
"\ufffd\u2022\u2026\u2122\u2014\u2013\ufb01\u017e"
51+
"\u0152\u2020\u2021\u0141\u2044\u2039\u0160\u203a"
52+
"\u0153\u0178\u017d\u0142\u2030\u201e\u201c\u201d"
53+
"\u0131\u0060\u00b4\u02c6\u02dc\u00af\u02d8\u02d9"
54+
"\u00a8\u201a\u02da\u00b8\u0027\u02dd\u02db\u02c7",
55+
strict=False,
56+
)
57+
)
58+
)
59+
60+
# Adobe Symbol charset.
61+
charset["Symbol"] = dict(
62+
zip(
63+
[*range(0o040, 0o200), *range(0o240, 0o400)],
64+
"\u0020\u0021\u2200\u0023\u2203\u0025\u0026\u220b"
65+
"\u0028\u0029\u2217\u002b\u002c\u2212\u002e\u002f"
66+
"\u0030\u0031\u0032\u0033\u0034\u0035\u0036\u0037"
67+
"\u0038\u0039\u003a\u003b\u003c\u003d\u003e\u003f"
68+
"\u2245\u0391\u0392\u03a7\u2206\u0395\u03a6\u0393"
69+
"\u0397\u0399\u03d1\u039a\u039b\u039c\u039d\u039f"
70+
"\u03a0\u0398\u03a1\u03a3\u03a4\u03a5\u03c2\u2126"
71+
"\u039e\u03a8\u0396\u005b\u2234\u005d\u22a5\u005f"
72+
"\uf8e5\u03b1\u03b2\u03c7\u03b4\u03b5\u03c6\u03b3"
73+
"\u03b7\u03b9\u03d5\u03ba\u03bb\u03bc\u03bd\u03bf"
74+
"\u03c0\u03b8\u03c1\u03c3\u03c4\u03c5\u03d6\u03c9"
75+
"\u03be\u03c8\u03b6\u007b\u007c\u007d\u223c\ufffd"
76+
"\u20ac\u03d2\u2032\u2264\u2215\u221e\u0192\u2663"
77+
"\u2666\u2665\u2660\u2194\u2190\u2191\u2192\u2193"
78+
"\u00b0\u00b1\u2033\u2265\u00d7\u221d\u2202\u2022"
79+
"\u00f7\u2260\u2261\u2248\u2026\u23d0\u23af\u21b5"
80+
"\u2135\u2111\u211c\u2118\u2297\u2295\u2205\u2229"
81+
"\u222a\u2283\u2287\u2284\u2282\u2286\u2208\u2209"
82+
"\u2220\u2207\u00ae\u00a9\u2122\u220f\u221a\u22c5"
83+
"\u00ac\u2227\u2228\u21d4\u21d0\u21d1\u21d2\u21d3"
84+
"\u25ca\u2329\u00ae\u00a9\u2122\u2211\u239b\u239c"
85+
"\u239d\u23a1\u23a2\u23a3\u23a7\u23a8\u23a9\u23aa"
86+
"\ufffd\u232a\u222b\u2320\u23ae\u2321\u239e\u239f"
87+
"\u23a0\u23a4\u23a5\u23a6\u23ab\u23ac\u23ad\ufffd",
88+
strict=False,
89+
)
90+
)
91+
92+
# Adobe ZapfDingbats charset.
93+
charset["ZapfDingbats"] = dict(
94+
zip(
95+
[*range(0o040, 0o220), *range(0o240, 0o400)],
96+
"\u0020\u2701\u2702\u2703\u2704\u260e\u2706\u2707"
97+
"\u2708\u2709\u261b\u261e\u270c\u270d\u270e\u270f"
98+
"\u2710\u2711\u2712\u2713\u2714\u2715\u2716\u2717"
99+
"\u2718\u2719\u271a\u271b\u271c\u271d\u271e\u271f"
100+
"\u2720\u2721\u2722\u2723\u2724\u2725\u2726\u2727"
101+
"\u2605\u2729\u272a\u272b\u272c\u272d\u272e\u272f"
102+
"\u2730\u2731\u2732\u2733\u2734\u2735\u2736\u2737"
103+
"\u2738\u2739\u273a\u273b\u273c\u273d\u273e\u273f"
104+
"\u2740\u2741\u2742\u2743\u2744\u2745\u2746\u2747"
105+
"\u2748\u2749\u274a\u274b\u25cf\u274d\u25a0\u274f"
106+
"\u2750\u2751\u2752\u25b2\u25bc\u25c6\u2756\u25d7"
107+
"\u2758\u2759\u275a\u275b\u275c\u275d\u275e\ufffd"
108+
"\u2768\u2769\u276a\u276b\u276c\u276d\u276e\u276f"
109+
"\u2770\u2771\u2772\u2773\u2774\u2775\ufffd\ufffd"
110+
"\ufffd\u2761\u2762\u2763\u2764\u2765\u2766\u2767"
111+
"\u2663\u2666\u2665\u2660\u2460\u2461\u2462\u2463"
112+
"\u2464\u2465\u2466\u2467\u2468\u2469\u2776\u2777"
113+
"\u2778\u2779\u277a\u277b\u277c\u277d\u277e\u277f"
114+
"\u2780\u2781\u2782\u2783\u2784\u2785\u2786\u2787"
115+
"\u2788\u2789\u278a\u278b\u278c\u278d\u278e\u278f"
116+
"\u2790\u2791\u2792\u2793\u2794\u2192\u2194\u2195"
117+
"\u2798\u2799\u279a\u279b\u279c\u279d\u279e\u279f"
118+
"\u27a0\u27a1\u27a2\u27a3\u27a4\u27a5\u27a6\u27a7"
119+
"\u27a8\u27a9\u27aa\u27ab\u27ac\u27ad\u27ae\u27af"
120+
"\ufffd\u27b1\u27b2\u27b3\u27b4\u27b5\u27b6\u27b7"
121+
"\u27b8\u27b9\u27ba\u27bb\u27bc\u27bd\u27be\ufffd",
122+
strict=False,
123+
)
124+
)

0 commit comments

Comments
 (0)