Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prepare mktables for Unicode 15.1 and 16.0 #23133

Draft
wants to merge 8 commits into
base: blead
Choose a base branch
from

Conversation

khwilliamson
Copy link
Contributor

perldelta not needed until the actual releases are incorporated.

  • This set of changes does not require a perldelta entry.

if (defined (my $bmg = property_ref('Bidi_Mirroring_Glyph'))) {
$bmg->set_to_output_map($EXTERNAL_MAP);
$bmg->set_range_size_1(1);
}

property_ref('Numeric_Value')->set_to_output_map($OUTPUT_ADJUSTED);

# These two properties have no short names and the file names for them
# clash in DOS 8.3. Work around this by creating shorter file names that
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are we still limited by 8.3?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On IRC the other day, I asked if we were still limited, and the answer was yes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For unicode filenames yes, but for ASCII filenames we don't AFAIK.

Add comments, and rewrap comment lines to fit 80 columns
Unicode 15.1 introduces this new property, which needs the same special
handling as plain NFKC_Casefold does.
Unicode 15.1 introduces new line breaking rules for Indic languages, via
a new property Indic_Conjunct_Break.  mktables works in conjunction with
regen/mk_invlists.pl to construct tables and DFAs for handling these.
This commit prepares mktables to do its part for Unicode versions that
have these new rules.
These files are changed in 15.1 to have @missings lines, whereas they
didn't before.  This leads to some warnings messages, so turn off
looking at them, as we do for a number of other files.
Unicode 15.1 changes the rules for line breaking with regards to
Quotation marks.  This prepares for that.
Unicode 15.1 adds new line breaking rules that depend on the dotted
circle.  This creates a table for that so that mk_invlists.pl doesn't
have to have exception code for handling it.
Copy link
Contributor

@jkeenan jkeenan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit message for aa6faba has 2 misspellings. infrastructue lacks the second r. In incoroporated the second o needs removal.

This is handled by ignoring it for now, and letting mktables know that
the properties it contains are empty.  This file, new in 16.0, gives
extra information about Egyption Hieroglyphics newly encoded in 16.0.
It is intended only for scholars of these ancient symbols.

mktables normally handles new properties automatically, but this file is
in a completely different format than previous ones, so mktables would
have to be adapted to understand that.  That might not be too hard,
given that mktables has infrastructure to handle other outliers that have
come along over the years from Unicode.  But, by ignoring this file, we
create empty tables which generate errors in other places in perl.
These are real bugs that ought to be fixed, and will be before 16.0 is
incorporated into blead.  And how many Egyptologists are there in the
world, much less how many use the latest Perl?

So the perldelta will say that 16.0's support doesn't include these,
which are mostly provisional anyway.
These new properties are automatically handled, but there is a problem.
They have no short form names.  Files are written for them based on
their names, and those files are not distinguishable on a DOS 8.3 file
system.  The solution here is to manually override the automatically
generated file names with distinguishable ones.
@jkeenan
Copy link
Contributor

jkeenan commented Apr 1, 2025

This p.r. for Unicode mktables did not make it into the March 20 dev release. Does that mean we have to defer it to the 5.43 dev cycle?

@Leont
Copy link
Contributor

Leont commented Apr 1, 2025

This p.r. for Unicode mktables did not make it into the March 20 dev release. Does that mean we have to defer it to the 5.43 dev cycle?

The change isn't really user visible, it would only affect people who would want to patch in a more recent Unicode version.

Copy link
Contributor

@jkeenan jkeenan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@khwilliamson there's one unresolved conversation in this p.r. If you mark that resolved, then I think this is okay to merge.

@khwilliamson
Copy link
Contributor Author

There are more commits coming

@khwilliamson khwilliamson marked this pull request as draft April 3, 2025 00:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants