-
Notifications
You must be signed in to change notification settings - Fork 2
/
README
74 lines (46 loc) · 2.2 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
These are experimental modules to handle various Unicode issues. This
is software under construction. Not even alpha state right now.
More information on Unicode can be found at http://www.unicode.org
Current modules are:
Unicode::String - represent strings of Unicode chars.
Unicode::CharName - look up character names
Some of ideas to investigate for the Unicode modules are:
o Fast encoding/decoding to various 8-bit char sets. Mapping
table objects perhaps?
o Fast convertion to other large char sets (east-asien). I don't
know anything about this.
o Composition/decomposition support:
$u->decomp; # will decomposite as much as possible: "å" --> "a°"
$u->comp; # will composite as much as possible: "a°" --> "å"
Need separate routines or a special argument to distinguish
between compatibility decomposition and canonical decomposition.
The last one is a subset of the first one.
o General Unicode string to number convertion (based on unidata
number attributes)
o Case convertions (lc, uc, ucfirst) last one should use title-case
o Fast lookup of Unicode attributes (unidata lookup using XS)
$u->isletter, $u->isupper, $u->islower,.... why do we need them when
perl does not need them for normal text??
o There might be some support for the private area (i.e. adding case
convertion and char properties to chars within the area).
o Unicode tr-function, sprintf-function
o Unicode string comparison functions: cmp(), le, eq,...
o Unicode regular expressions: m// s/// split(//,..)
o Unicode filehandles (automatic convertion from UTF-7/UTF-8/8-bit
char set when reading,writing to filehandles)
The following are examples of use of the current modules:
use Unicode::String qw(latin1 utf8);
$u = utf8("this is a string\n");
print $u->ucs4;
print $u->utf16;
print $u->utf8;
print $u->utf7;
print $u->latin1;
print $u->hex;
print latin1("naïve\n")->utf8;
use Unicode::CharName qw(uname);
print uname(ord('$')), "\n";
COPYRIGHT
© 1997 Gisle Aas. All rights reserved.
This library is free software; you can redistribute it and/or modify
it under the same terms as Perl itself.