-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugfixes: locales and OMP series numbers #116
base: main
Are you sure you want to change the base?
Conversation
@defstat, could you review this? Thanks! |
I fixed an other bug. If you tried to cite a chapter and there is no chapter author, then you will get an error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have any objections on that changes
Should I do anything here or was this only fallen into oblivion? |
|
||
protected function mapLocale(string $locale): string | ||
{ | ||
$locales = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes me nervous that we're inferring the region from the language code. For example, fa
could be fa-IR
or fa-AF
(or possibly others). Hard-coding this mapping also means it'll need to be updated. Is there another way of doing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Only languge codes you can find here (https://github.com/citation-style-language/locales ) are supported by CSL. Without a mapping we get always default en_US. So we need a mapping. We could realize the mapping also in the plugin settings. Then users are free to choose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(It just so happens that we're having an internal discussion about this at the moment -- more soon.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's what I'd suggest, e.g. to get the name of the CSL locale from an IETF language tag:
/**
* Find the best match for a CSL locale, given the desired IETF language tag (and a fallback default).
* @param $locale An IETF language tag.
* @param $default A locale code to use for a default. This should already be sanitized.
* @return string A language code that's available in the CSL library.
*/
function getCSLLocale(string $locale, string $default = 'en-US') : string
{
$prefix = 'plugins/generic/citationStyleLanguage/lib/vendor/citation-style-language/locales/locales-';
$suffix = '.xml';
$preferences = [
'es' => 'es-ES',
];
// Determine the language and region we're looking for from $locale
$localeParts = locale_parse($locale);
$language = $localeParts['language'];
$region = $localeParts['region'] ?? null;
$localeAndRegion = $language . ($region ? "-{$region}" : '');
// Get a list of available options from the filesystem.
$availableLocaleFiles = glob("{$prefix}*{$suffix}");
// 1. Look for an exact match and return it.
if (in_array("{$prefix}{$locale}{$suffix}", $availableLocaleFiles)) return $locale;
// 2. Look in the preference list for a preferred fallback.
if ($preference = $preferences[$localeAndRegion] ?? false) return $preference;
// 3. Find the first match by language.
foreach ($availableLocaleFiles as $filename) {
if (strpos($filename, "{$prefix}{$language}-") === 0) {
return substr($filename, strlen($prefix), -strlen($suffix));
}
}
// 4. Use the supplied default.
return "{$prefix}{$default}{$suffix}";
}
- It's safe to use user-supplied data for the
$locale
parameter - It uses a single
glob()
filesystem scan to find available locales, but this is acceptably fast, I think - You can provide
$preferences
inside the function (so that e.g.es-ES
is preferred fores
, rather thanes-CA
), but it's not necessary to maintain a list of all locales; providingzh
will result successfully matchzh-CN
, for example. - It will handle more exotic locale variants e.g.
uz@latin
I think this will be easier/better to maintain than an exhaustive list of locales, which will get forgotten. And it's about the same length :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that looks very well. What do you think, should we make the preferences configurable in the settings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I'm pretty sure that we can choose a set of preferences that will work well for 99% of users, and save ourselves the pain of making it configurable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can start with only considering those that have multiple variants in CSL. The other ones would be found by that logic Alec suggested above. So, for now, those would be:
de -> de-DE
en -> en-US
es -> es-ES
fr -> fr-FR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a note that instead using locale_parse
we might need to use:
$language = \Locale::getPrimaryLanguage($locale);
$region = \Locale::getRegion($locale) ?? null;
And that $prefix = $this->getPluginPath() . '/lib/vendor/citation-style-language/locales/locales-';
Bugs
CSL needs to choose the right language xml file a locale in format "xy-ZW". So it only uses a fallback "en-US" since we change the locales. In this solution I first look, if "xy-XY" exists and else I'm looking in an array with entries build "xy" => "xy-ZW". If tried to map all possible languages form https://github.com/citation-style-language/locales. Only for zh-CN and zh-TW I had no idea, how to decide which one is to use.
Series number in OMP was mapped to "volume", but should be mapped to "collection-number". See here: https://docs.citationstyles.org/en/stable/specification.html#appendix-iv-variables