-
-
Notifications
You must be signed in to change notification settings - Fork 45
move props bmg bpb EqUIdeo from misc to string #383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
<code point> makes sense for any property that is fundamentally used for a transformation that leaves unmentioned code points alone of the string alone (like lowercase). This isn't used as that kind of transformation, so I think <none> is better. |
Agree with Mark. |
Ok. I will see how I can make that happen.
Every character property is defined for every code point, with an explicit value or with a "missing" value. |
Technically you are correct, but that's not what I'm after. A value of "none" means that there's no string defined for that code point. So, to ask the question again, do we promise somewhere that string-valued properties are string-valued for all code points? If we do, then we should either change that promise, or make this a "miscellaneous" property again, where we can then spell out precisely that for this property, some code point are mapped to a string, while others are mapped to nothing (or "none"). And I would argue, further, that defining a missing value of "none"> or "n/a" is only mathematically, and not practically different from a property that is simply undefined for some ranges of code points. Alternatively, we could make a stronger point of clarifying how mappings differ from other string-valued properties, including the guarantee that they have a string for each code point (and if needed, we can introduce the empty string). |
16aabc5
to
1d40037
Compare
I had similar questions recently in ICU4X (unicode-org/icu4x#2833) about the property value type for To the question of whether
IIUC, the |
All Could I please get someone to approve this PR? |
I think we should not proceed hastily without solving the deeper issue. We are not doing API design here, so it matters whether we conceive of <none> as a way to say "undefined" or "empty string". From memory, what I think I know about these two properties, the values are truly "undefied", "not there" or "not defined", in that there is the absence of a mapping. If an API gave you the empty string, you would not use that literally, but to detect that a mapping isn't possible. In other words, for all other values you would map some code point to the string value as part of processing it, but if it's undefined you would not map to the empty string, but do something else. Which is the tell that <none> in this context is not the empty string. That means, we still need to find out whether string-valued properties make an implied or explicit promise that <none> is the same as the "empty string". Which, in this case would be wrong. I'm suggesting that in TR23 and chapter 3 we may need to have a definition for properties that are undefined for some inputs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the purposes of annotating the property as a string in the data files and supporting tools, this PR LGTM.
It SGTM to have followup work in TR23 (and touchups to UAX44?) accordingly, which I assume has to happen outside of this PR.
Overthinking here. No one is confused about Bidi_Paired_Bracket mapping to The point of this AI & PR is that these three properties map characters to characters, and therefore we can call them String properties and don't have to use the "Miscellaneous" escape hatch. |
1d40037
to
f75e07b
Compare
I just made one more fix in BidiBrackets.txt which used to say unnecessarily what types its properties are. Rather than keeping those in sync, I removed that, leaving it up to PropertyAliases.txt and UAX 44 to define the types of properties, as usual. |
For
except:
When writing the action item (L2/22-124 item UCD17), we seem to have mixed up Bidi_Paired_Bracket (Miscellaneous, should be String) with Bidi_Paired_Bracket_Type (is Enumerated). I changed Bidi_Paired_Bracket to String as well.
See
# Field 1: Bidi_Paired_Bracket property value, a code point value or <none>
Old parts of the tools code hardcode the property type and
@missing
value for some properties. I changed that code to achieve the desired output, including keeping the Bidi_Paired_Bracket@missing
value as<none>
(as documented in the data file), rather than having it switch to<code point>
. (See PR discussion.)See https://www.unicode.org/reports/tr44/
<none>
<code point>