-
-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Well-formed vs valid #935
Comments
A few notes:
Note that we have text about option resolution in the spec which does indeed drop bad options on the floor. But for options whose interpretation is inside the function handler, the dropping-on-the-floor part is up to the function itself. This is why there is a Are there specific changes you want in the spec? I'd advise a careful look at u-namespace.md and registry.md as well as option resolution in formatting.md. |
I was struck by the fact that we are requiring valid for some identifiers (eg timezones), but only well-formed for currencies. Those feel like very similar cases, so if well-formed is right for currency, that term should also be right for timezones (or the inverse).
But a straightforward reading of registry.md means that we don't allow that in many cases (whenever we say well-formed (like currencies) or valid like:
But that means I can't use implementation-defined identifiers like "$California Time" |
No, you're correct about this. We should be well-formed for acceptance but permit checking for validity. And we should fix values to permit implementation-specific gorp (mainly for platform-specific values that aren't the sanctioned identifiers) |
makes perfect sense |
Mark (@macchiati) wrote:
I think the SHOULD in this paragraph should be a MAY, for obvious reasons. |
This was discussed in the 2024-11-18 call. We resolved to use valid in most cases, but with careful phrasing in the boilerplate. I believe this is now addressed? |
I elaborated a bit. I would like to discuss further, after 46.1 |
I see your elaboration. One callout: "implementation" has to be used carefully. In most cases in our spec it refers to the MessageFormat framework/executable/host environment itself, e.g. in ICU4J the actual At the function set level, there is a different layer of "implementation", specifically what we call the function handler. This is what a lot of the normative language in the current registry.md is about. In general, the function handler is some code that maps option values to local API-specific representations. So for "digit size options", it parses the option value. If it's a positive integer, great. Otherwise it's not valid. We definitely want to impose standards on options and their values, to ensure interoperability. But the MF2-level implementation has no role in this (once the message is syntactically correct). Instead, the specific function handler, such as for |
I agree that there are important distinctions to be made, and in any final text we should make it clear. What I'm specifically talking about are the implementations of the standard functions defined in the registry.md. Whatever we do, it should be clear what kinds of results we can expect to have, and what kinds of errors we can expect to see raised (which might be different for ill-formed vs well-formed+invalid vs well-formed+valid+unsupported vs well-formed+valid+supported). Some of that could apply to implementation-defined functions, but I didn't want to talk about that in this issue. |
I would still like to discuss this further. For the standard functions, we do not do interoperability any favors by not differentiating between the following. As the user, I would want a linter or precompiler to know that:
For reference, the listed values are:
So I think we should make the validity/well-formedness distinction for all the standard functions, and recommend it for non-standard functinons. |
@macchiati asked for Agenda+ on this item via email. |
Added text 2024-11-24
I think we need to be careful about our usage of the terms 'well-formed' and 'valid'. The following is not fully fleshed out; it is more of a discussion of the issue and some ideas for the future.
We often reference other sources for identifiers, and want them to be interpreted according to that source. Sources that change over time should (and typically do) distinguish between well-formed and valid. For example, 'ge:manic' is not a well-formed locale identifier, and 'de-Flub' is not a valid locale identifier. However, 'de-Flub' could (conceivably) become valid in the future, if a script is given the code 'Flub'. Good sources also never remove identifiers, or make material changes in the meaning, but may deprecate them: those are still treated as valid.
When we reference such sources in message format, such as with option values, we have a few goals.
This is also true for our own enums, . We have in registry.md:
We also have BNF:
The implications are that conformant implementation can interpret any of:
{$x :currency compactDisplay=short}
{$x :currency compactDisplay=medium}
{$x :currency compactDisplay=μικρός}
{$x :currency compactDisplay=|🐭|}
{$x :currency compactDisplay=$myDisplay}
It can also interpret:
{$x :currency currency=CAD}
{$x :currency currency=MyCurrency}
{$x :currency currency=δολάριοΚαναδά}
{$x :currency currency=|¥|}
{$x :currency currency=|🐭|}
{$x :currency currency=$myCurrency}
It could also interpret compactDisplay=short by formatting a long form, and compactDisplay=long by formatting a short form. Or a value of CAD as being GBP, etc.
This level of freedom seems counterproductive for interoperability.
So I propose that we have the general rule something like the following, where option values are defined according to a reference to an external source
locale=|ge:manic|
]locale=|dab|
]locale=|def|
; it may also ignore all deprecated language identifiers, and thus ignorelocale=|daf|
.]Ignore means that the expression is interpreted as if the option were not there. (I won't talk here about what signals to the caller are associated with that.)
I think we could apply that to our standard enum option values, such as the following in https://github.com/unicode-org/message-format-wg/blob/main/spec/registry.md#options-1, so that |@!$| could be recognized as ill-formed.
That is, perhaps we can have a rule in the registry for our functions, something like: the default well-formedness criteria for standard function option values matches the constraints on function option identifiers in README.md. Thus |$abc| would be ill-formed for useGrouping. Any function option that had different criteria for well-formedness of its values would simply have have an explicit well-formedness statement.
The text was updated successfully, but these errors were encountered: