You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently the ZIM "Language" Metadata can automatically be filled with only one language. Zimit check it on the Welcome page and then set it. Even if the other pages are using other languages.
It would be better to check all the pages, gather the list of languages and then at the end, set the "Language" Metadata properly.
I'm not sure about this. I think what you propose will decrease quality while we already have quality issues with zimit.
The goal of this metadata is to inform users about the main languages in use in the ZIM so he can filter it in/out. It's not a technical one like the Counter which exhaustively lists all content types.
I'm afraid we'll often end up with several languages that are meaningless to the ZIM… while being time consuming (parsing all HTML entries) and while only reporting HTML languages and not the one of say PDF files for instance.
It should be set manually because that's what's best. Even a person foreign to the website can visit it and under 30s find out what the main languages are.
Now we have a shortcut that uses the main page's language because that's the most frequent use case.
I propose we make the language param mandatory and add a special handling for the homepage value which will use the homepage's language. We could even set homepage as default value in youzim.it's form.
Independently of this, warc2zim should allow specifying multiple languages which it doesn't at the moment.
Currently the ZIM "Language" Metadata can automatically be filled with only one language. Zimit check it on the Welcome page and then set it. Even if the other pages are using other languages.
It would be better to check all the pages, gather the list of languages and then at the end, set the "Language" Metadata properly.
Follow comments on #186
The text was updated successfully, but these errors were encountered: