Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add "original_title" to data is applicable
Some words have titles that cannot be easily handled by Wikimedia's page engine (or whatever is an appropriate term here...), like "C#". These words have special (bespoke?) articles with urls containing "Unsupported titles/" and a url-friendly string instead, which causes problems with searching for those original articles or generating urls "back" to Wiktionary. If the "word" field is different from the actual article title (first line of wiktextract extract raw debug page starting with "TITLE: " in our case), then add a new field "original_title" containing the original title. Debug messages are printed out in two flavors: "Unsupported titles/" need to be handled separately by adding them to unsupported_title.py Words that differ from the original titles are otherwise suspicious and get a generic debug message.
- Loading branch information