Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add old tables #28

Open
jayvdb opened this issue Oct 2, 2019 · 3 comments
Open

Add old tables #28

jayvdb opened this issue Oct 2, 2019 · 3 comments

Comments

@jayvdb
Copy link
Contributor

jayvdb commented Oct 2, 2019

It would be useful to be able to refer to tables for the previous versions of Unicode.

jquast/wcwidth#23 is attempting to do that.

It would also be very helpful to faciliate Python based analysis of the changes in Unicode data.

It seems the build infrastructure of unicodedata2 is perfect for that.

In order to avoid forcing all users to install all data, perhaps a separate PyPI package name could be used for the 'all unicodedata versions' edition of this.

@jquast
Copy link

jquast commented Mar 1, 2020

Sorry for being so late to the conversation, but I am interested in joining forces :)

And I do have time for this etched out in the coming weeks. As far as wcwidth goes, the PR is to clean up the build infrastructure, so it's very much similar to the needs of unicodedata2's build infrastructure. I'll be sure to sit down and study unicodedata2 before I go any further with jquast/wcwidth#23 changes.

@jquast
Copy link

jquast commented Mar 24, 2020

If there was something both of our packages could use, it would be "well-structured unicode data", the TXT files well-parsed and annotated, with the copyrights and dates and comments if possible, maybe just some json or toml data files.

If a CLI utility existed that helped navigate, fetch & parse the unicode text files archive, and spit out data blobs, this CLI tool could be a requirements-dev.txt for our projects that we could use for our respective code generation. This CLI app would be based on the class UnicodeData, roughly, from the unicodedata/2.py files.

@jayvdb: analysis of changes by version, through unicodedata2, would require an excess of API calls into the resulting C module, which we would have to manage a new API for a multi-verse, and then to organize those return values into structured data to compare. Phew! I think the CLI utility I propose would be better for any difference analysis, the data structures it outputs could immediately be analyzed for comparison without any further transformation.

@jayvdb
Copy link
Contributor Author

jayvdb commented Mar 24, 2020

@jquast , what about if unicodedata2 had a "set unicode version" function, which switched the tables between versions.

The caller would then extract the info they needed from one version, and then switch and repeat with the other version?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants