-
-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid Capitalisation of beginning of humanisation #87
Comments
Do you have an example of a message for which this is currently going wrong? |
E.g. in French: forced capitalization is wrong on almost every part of formatted dates (including but not limited to month names, weekday names, season names),
Forced capitalization is wrong for ordinals (like "premier", "deuxième", ...), for all prepositions (like "de", "à", "vers", ...) and adverbs (like "environ", "approximativement", ...), and for all names and adjectives referencing to languages/cultures, and derived terms like adjectives (like "chrétien(ne)", "bouddhiste", "musulman(e)", "islamique"). |
Yeah ok. So what is an example of those rules being violated by the current implementation? Any of the translations in here wrong? https://github.com/ProfessionalWiki/EDTF/blob/master/tests/Functional/FrenchHumanizationTest.php |
As you see, there's no common format for datetime ranges (or open intervals), the prepositions and articles (and their contractions of preposition+article or articles with apostrophes before a term starting by vowel or a unaspirated mute 'h') depend on the precision (alternatives can also use appended adverbs like "environ"); and all strings above should have a leading lowercase letter. How to handle contractions of articles with apostrophes depend not just on precision, but also specific values (like "avril" here); there's a general rule in French for most vowels (a, e, i, on u, y, possibly with accents), but complexities for terms starting with 'y' or 'i' followed by another vowel, and for terms starging by a mute 'h' (you need lookup in a French dictionary to know if it is aspirated or not; borrowed foreigh terms may be using non-mute 'h' but in most cases there are mute in French, and there's no rule to know if it's aspirated or not; it's a matter of usage). However for translating dates, such dictionnary lookup in French would not be very complex, there are not a lot of terms. These phenomenoms also apply in Italian and Spanish and many other languages. |
We shuold note that EDTF deviates from CLDR only because "raw" values use additional delimiters that are still not specified in ISO 8601 (".." for ranges/intervals, or "%", "~", and "?" for uncertainty and "," for list of values; it had "()" also for subenemarations on some elements, but you deprecated them by adding "left/right" semantics for uncertainty qualifiers). But CLDR has full support for translation of ranges/intervals, variable precision for individual dates. ETF also defined some "magic" values for "pseudo-months" representing seasons, quarters of years, quadrimestres, and half-years. In CLDR they are using another format (e.g. "H1" and "H2" for half-years, "Q1" to "Q4" for quarters, it also adds "W1 to W53" for weeks in the year, inherited from ISO 8601 but that EDTF still forgot to specify). Such extensions could be added in CLDR (by adding a request to them). And possibly integrated in its "root" locale, or in a separate special locale (like "POSIX", or "C" in legacy standard i18n libraries for C/C++), if there's a need to deviate (for example CLDR uses en-dashes rather than ".." for its root locale, and some locales for actual languages may change en-dashes with or without surrounding non-breaking spaces, depending on the number of date elements that they link together in the range). CLDR however has standardized translated items without forcing the leading capital. However the translations made for EDTF are not directly related at all to the EDTF compact syntaxic format, whose purpose is only to represent "raw" values in a locale-neutral format. This means that EDTF libaries should not depend at all about translation, made separately in CLDR. The EDTF is just a particular locale, that is parsable into Datetime objects, or extended Datetime objects (for optional certainty/approximation qualifiers), possibly part of collection objects (lists). Wikidata itself defines its own "qualifiers" to represent certainty/approximations, and does not need use lists: instead it represent each given date or interval as separate items, so for Wikidata the ISO 8601 standard is sufficient and does not need EDTF at all. Wikidata also supports dates relative to eras, and different calendar systems (not just the modern Gregorian calendar which is insufficient, including for many historic dates or official modern uses). Datetime elements should also support the ISO 8601 specification for timezone indicators and for week numbers (CLDR contains much data about them and their translation). |
Interesting suggestion by @verdy-p that will increase the usability of this repo:
Originally posted by @verdy-p in #77 (comment)
The text was updated successfully, but these errors were encountered: