Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract French Wiktionary etymology list #346

Merged
merged 2 commits into from
Sep 25, 2023
Merged

Extract French Wiktionary etymology list #346

merged 2 commits into from
Sep 25, 2023

Conversation

xxyzz
Copy link
Collaborator

@xxyzz xxyzz commented Sep 25, 2023

This pull request extracts French Wiktionary etymology list and breaks the list text to JSON list "etymology_texts"(this is a str type for English JSON file), examples can be found in test cases.

Pre-expand them makes the list item nodes nested in HTML nodes, which
is unnecessary.
@kristian-clausal
Copy link
Collaborator

kristian-clausal commented Sep 25, 2023

At some points it would be good to go through all the differences that have accumulated between the en and fr/zh versions of the data 'scheme', so that we can consolidate them (probably change en closer to fr/zh, because en being older it doesn't meet all the needs of newer stuff and the newer stuff most probably is backwards compatible except when it isn't).

@xxyzz
Copy link
Collaborator Author

xxyzz commented Sep 25, 2023

This pr also adds a new "pos_title" in French JSON for matching etymology text. fr/zh uses "translation" in the "examples" list(it is "english" in English JSON). And Chinese JSON breaks Simplified/Transitional character example sentences to a list(also str in English JSON). That's some differences I can remember for now.

Unlink English Wiktionary, French Wiktionary writes all etymology data
of different POS types inside the same section. And each POS data uses
a list("*") or indent(":").
@xxyzz xxyzz merged commit be3fd6f into tatuylonen:master Sep 25, 2023
3 checks passed
@xxyzz xxyzz deleted the fr branch September 25, 2023 10:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants