Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to unicode 8.0 #8

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

antifuchs
Copy link

@antifuchs antifuchs commented Aug 20, 2016

What does this PR do?

This updates the data files and generated sources to the latest released Unicode data file version, 8.0. Things like "U+0001F643 UPSIDE-DOWN FACE" are now find/name-able.

Notes

  • This change replaces the (custom?) codepoint_name.csv with the unchanged UnicodeData.txt file - I think I have figured out how you generated codepoint_name.csv, but I feel that just using the upstream source might make it easier to keep up to date with newer releases.
  • I've based this on Update unicode_names for newer rusts #7, so that I could build and test it in newer environments. If you merge Update unicode_names for newer rusts #7, I'll rebase this so it can be merged also.
  • Same comment re. the generator applies - I am not sure if I should have used any other command-line arguments when generating the updated source... but it does seem to do the right thing (:

Again, hope you find this useful - until you find the time to review & merge these PRs, I'll use this revision in chars (-:

Compiling the genrator fails due to incompatible crates. This change
bumps the versions of those crates, and allows the generator to build.
Rust 1.10 (and some versions before that, probably) are complaining that
non-functions can't have inline declaractions. Prevent the generator
from emitting those.
This updates the generated rust files to correctly compile on more
recent rusts (nightly and 1.8+).
* Use a less-specific regex crate version (0.1)

* Update TokenTree type ref to point to the current location

* Rename LitChar to ast::LitKind::Char

This compiles on rustc nightly, 1225e122f 2016-07-30 now.
Rust 1.5 deprecated the range_inclusive library in favor of the
... syntax, which needs a feature. Update the test feature set to
include that & use it.
It seems this feature is unknown in latest rust nightly, too, so remove
it also.
@autohuonw
Copy link
Collaborator

Thanks for the pull request, and welcome! You should hear from @huonw (or someone else) soon.

* Replace codepoint_name.csv with Unicode 8.0's UnicodeData.txt from
  ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txt (retrieved
  2016-08-20).

* Un-ignore data/UnicodeData.txt, since it's the primary source of truth
  now.

* Update tests and the generator for the slight change in format: Hex
  digit char codes instead of decimals.

* Re-run the generator.
@antifuchs antifuchs force-pushed the update-to-unicode-8.0 branch from f4e81f0 to ef5e067 Compare August 20, 2016 19:07
@progval
Copy link

progval commented Jun 6, 2018

ping?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants