Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add linkage_subtitles.json for German Wiktionary #376

Merged
merged 2 commits into from
Oct 21, 2023

Conversation

empiriker
Copy link
Contributor

After taking a closer look at the French linkage.py, I realized that I should probably follow the conventions there.

The PR:

  • Adds linkage_subtitles.json for German Wiktionary
  • Renames "semantic_relations" to "linkages"
  • Adds support for the different --capture_X options
  • Sorts the French linkage_subtitles.json alphabetically.

@xxyzz It seems to me that the linkage.py already captures most of the semantic relations (linkages). Is there something that I missed and still required implementation?

This work is a contribution to the EWOK project, which receives funding from LABEX ASLAN (ANR–10–LABX–0081) at the Université de Lyon, as part of the "Investissements d'Avenir" program initiated and overseen by the Agence Nationale de la Recherche (ANR) in France.
@xxyzz
Copy link
Collaborator

xxyzz commented Oct 20, 2023

I don't remember I had written that file... I might missed some cases, I only handled two templates. Maybe you'll find more linkage templates.

The French extractor has more rooms for improvement, for example, the inflection sections are not processed. I only find a few pages that have "déclinaison" or "conjugaison" section, and all of them are huge tables. And I already created a complex function to parse small inflection tables, I'm afraid this function will grew out of control if I add more lines to it.

This work is a contribution to the EWOK project, which receives funding from LABEX ASLAN (ANR–10–LABX–0081) at the Université de Lyon, as part of the "Investissements d'Avenir" program initiated and overseen by the Agence Nationale de la Recherche (ANR) in France.
@empiriker
Copy link
Contributor Author

The French linkage.py seems to be pretty broad and is pretty much what I intended to cover. I noted some problems where there are multiple links in one list item, e.g. * [[laçage]] ou [[lacement]] in the page lacet, but generally it does a good job.

The project that allows me to contribute to this repo is mostly interested in increasing the coverage of wiktextract for other editions while doing "well enough" in some fields (mainly glosses, examples, pronunciation, translations and linkages). So I'll have to prioritize these. I might then look to improve other bits out of personal interest. 🤞

@xxyzz xxyzz merged commit b2ab827 into tatuylonen:master Oct 21, 2023
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants