Extract semantic relations from German Wiktionary. #375
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This adds support for extracting different several kinds of semantic relations from the German Wiktionary. It covers the most prevalent way in which these sections are structured.
It does not:
{{Synonyme}}\n:[1] [[Kokosnusspalme]], ''wissenschaftlich:'' [[Cocos nucifera]]
from Kokospalme the tagwissenschaftlich
will be ignored{{Redewendungen}}:[1] ''[[aller Anfang ist schwer]].
from aller the relation will not be captured since generally italics (even with links within) seem to be used to modify a relation, not to create oneI looked into all of these cases but there just doesn't seem to be a general rule that allows clearly separating the semantic relations from the rest.
For now, I think it's a great start.
FYI