Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create human readable classification diff #34

Open
xrotwang opened this issue May 13, 2020 · 0 comments
Open

Create human readable classification diff #34

xrotwang opened this issue May 13, 2020 · 0 comments

Comments

@xrotwang
Copy link
Contributor

[...] maybe it would be nice with
a browseable list of clf changes between versions. It's cheap
since it can be automatically generated. E.g. if V1 and V2
are the older and later version respectively

  1. First generate the list of language-level inventory changes,
    which is (a) lgs(V1) \ lgs(V2)

"The following languages were removed from the language inventory"

Lg; Family(V1, lg); Macro_Area; Comment

Where comment is one of three possibilities:
(i) if moved to bookkeeping: "Spurious see link to lg in V2"
(ii) if promoted to subfamily: "Rendered as subfamily see link to lg in V2"
(iii) if demoted to dialect: "Rendered as dialect see link to lg in V2"

and (b) lgs(V2) \ lgs(V1)

"The following languages were added to the language inventory"

Lg; Family(V2, lg); Macro_Area; Comment

Where comment is one of three possibilities:
(i) if non-existent (at any level) in V1: "Added see link to lg in V2"
(ii) if demoted from subfamily: "Previously rendered as subfamily see link to lg in V2"
(iii) if promoted from dialect: "Previosuly rendered as dialect see link to lg in V2"

  1. For classification rearrangements, for each lg in lgs(V1) intersection
    lgs(V2) consider their parent paths p1 and p2 in V1 and V2 respectively.
    E.g., the parent path for Yaroame [yaro1235] is (yano1268, nina1239, yano1266).
    For each lg where p1 != p2, group on the tuple (p1, p2) and show

"The following languages were moved"

Lg; Family(V1/V2, lg); From-To(p1, p2); Macro_Area; Reference

Where Family(V1/V2, lg) is the Family of the lg or the string
family1/family2 if they are not the same, reference is just link to
clf reference (in v2)
, and From-To(p1, p2) can be computed as follows.
Align p1 and p2 by levenshtein distance to get an aligned sequence
a_1 ... a_n where each a_i = (x, y) is a pair of path elements
from p1, p2 or None. (To break ties among alignments with the
same Levenshtein distance, prefer the one with minimal # of substituions.)
From the a_i sequence form the sequence which is

"..." if x == y
"x if y == None
"y" if x == None
"x->y" otherwise

From-To(p1, p2) is then then the comma-separated concatenation of this
latter sequence but with any sequence of consecutive "...":s replaced
by just one "..."

E.g the paths

(yano1268, nina1239, yano1266)
(yano1268, nina1239, yano1266, aaaa1234)

would get From-To: ..., aaaa1234

(yano1268, nina1239, yano1266)
(yano1268, yano1266, aaaa1234)

would get From-To: ..., nina1239, ..., aaaa1234

(yano1268, nina1239, yano1266)
(yano1268, nina1239, aaaa1234)

would get From-To: ..., yano1266->aaaa1234

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant