-
Notifications
You must be signed in to change notification settings - Fork 571
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prepare mktables for Unicode 15.1 and 16.0 #23133
base: blead
Are you sure you want to change the base?
Conversation
if (defined (my $bmg = property_ref('Bidi_Mirroring_Glyph'))) { | ||
$bmg->set_to_output_map($EXTERNAL_MAP); | ||
$bmg->set_range_size_1(1); | ||
} | ||
|
||
property_ref('Numeric_Value')->set_to_output_map($OUTPUT_ADJUSTED); | ||
|
||
# These two properties have no short names and the file names for them | ||
# clash in DOS 8.3. Work around this by creating shorter file names that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where are we still limited by 8.3?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On IRC the other day, I asked if we were still limited, and the answer was yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For unicode filenames yes, but for ASCII filenames we don't AFAIK.
Add comments, and rewrap comment lines to fit 80 columns
Unicode 15.1 introduces this new property, which needs the same special handling as plain NFKC_Casefold does.
Unicode 15.1 introduces new line breaking rules for Indic languages, via a new property Indic_Conjunct_Break. mktables works in conjunction with regen/mk_invlists.pl to construct tables and DFAs for handling these. This commit prepares mktables to do its part for Unicode versions that have these new rules.
These files are changed in 15.1 to have @missings lines, whereas they didn't before. This leads to some warnings messages, so turn off looking at them, as we do for a number of other files.
Unicode 15.1 changes the rules for line breaking with regards to Quotation marks. This prepares for that.
Unicode 15.1 adds new line breaking rules that depend on the dotted circle. This creates a table for that so that mk_invlists.pl doesn't have to have exception code for handling it.
4894f2a
to
1f07a91
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The commit message for aa6faba has 2 misspellings. infrastructue
lacks the second r
. In incoroporated
the second o
needs removal.
This is handled by ignoring it for now, and letting mktables know that the properties it contains are empty. This file, new in 16.0, gives extra information about Egyption Hieroglyphics newly encoded in 16.0. It is intended only for scholars of these ancient symbols. mktables normally handles new properties automatically, but this file is in a completely different format than previous ones, so mktables would have to be adapted to understand that. That might not be too hard, given that mktables has infrastructure to handle other outliers that have come along over the years from Unicode. But, by ignoring this file, we create empty tables which generate errors in other places in perl. These are real bugs that ought to be fixed, and will be before 16.0 is incorporated into blead. And how many Egyptologists are there in the world, much less how many use the latest Perl? So the perldelta will say that 16.0's support doesn't include these, which are mostly provisional anyway.
These new properties are automatically handled, but there is a problem. They have no short form names. Files are written for them based on their names, and those files are not distinguishable on a DOS 8.3 file system. The solution here is to manually override the automatically generated file names with distinguishable ones.
1f07a91
to
de01c61
Compare
This p.r. for Unicode mktables did not make it into the March 20 dev release. Does that mean we have to defer it to the 5.43 dev cycle? |
The change isn't really user visible, it would only affect people who would want to patch in a more recent Unicode version. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@khwilliamson there's one unresolved conversation in this p.r. If you mark that resolved, then I think this is okay to merge.
There are more commits coming |
perldelta not needed until the actual releases are incorporated.