Add old tables #28

jayvdb · 2019-10-02T04:58:57Z

It would be useful to be able to refer to tables for the previous versions of Unicode.

jquast/wcwidth#23 is attempting to do that.

It would also be very helpful to faciliate Python based analysis of the changes in Unicode data.

It seems the build infrastructure of unicodedata2 is perfect for that.

In order to avoid forcing all users to install all data, perhaps a separate PyPI package name could be used for the 'all unicodedata versions' edition of this.

jquast · 2020-03-01T02:05:17Z

Sorry for being so late to the conversation, but I am interested in joining forces :)

And I do have time for this etched out in the coming weeks. As far as wcwidth goes, the PR is to clean up the build infrastructure, so it's very much similar to the needs of unicodedata2's build infrastructure. I'll be sure to sit down and study unicodedata2 before I go any further with jquast/wcwidth#23 changes.

jquast · 2020-03-24T05:59:21Z

If there was something both of our packages could use, it would be "well-structured unicode data", the TXT files well-parsed and annotated, with the copyrights and dates and comments if possible, maybe just some json or toml data files.

If a CLI utility existed that helped navigate, fetch & parse the unicode text files archive, and spit out data blobs, this CLI tool could be a requirements-dev.txt for our projects that we could use for our respective code generation. This CLI app would be based on the class UnicodeData, roughly, from the unicodedata/2.py files.

@jayvdb: analysis of changes by version, through unicodedata2, would require an excess of API calls into the resulting C module, which we would have to manage a new API for a multi-verse, and then to organize those return values into structured data to compare. Phew! I think the CLI utility I propose would be better for any difference analysis, the data structures it outputs could immediately be analyzed for comparison without any further transformation.

jayvdb · 2020-03-24T08:23:08Z

@jquast , what about if unicodedata2 had a "set unicode version" function, which switched the tables between versions.

The caller would then extract the info they needed from one version, and then switch and repeat with the other version?

jayvdb mentioned this issue Oct 2, 2019

Support all Unicode Versions jquast/wcwidth#23

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add old tables #28

Add old tables #28

jayvdb commented Oct 2, 2019

jquast commented Mar 1, 2020

jquast commented Mar 24, 2020

jayvdb commented Mar 24, 2020

Add old tables #28

Add old tables #28

Comments

jayvdb commented Oct 2, 2019

jquast commented Mar 1, 2020

jquast commented Mar 24, 2020

jayvdb commented Mar 24, 2020