-
Notifications
You must be signed in to change notification settings - Fork 214
Support data for Bidi properties with combined data structure #3026
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
components/properties/src/props.rs
Outdated
#[allow(clippy::exhaustive_structs)] // newtype | ||
#[repr(transparent)] | ||
#[zerovec::make_ule(BidiMirroringGlyphULE)] | ||
pub struct BidiMirroringGlyph(pub u32); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue: We should make this a 3-byte type so that the data provider also stores it as 3 bytes.
[u8; 3]
is one way to accomplish this. Alternatively, a cleaner approach would be to add UnvalidatedChar
in the zerovec crate with the correct semantics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to use the high 3 bits to fit more related properties in this data structure, so just "unvalidated char" wouldn't the the whole semantic. We might reuse the U24
struct that the normalizer had before.
Quoting self from Slack:
I do think it makes sense to bundle at least Bidi_Mirroring_Glyph, Bidi_Mirrored, Bidi_Paired_Bracket , and Bidi_Paired_Bracket_Type as a 24-bit value in one trie. I'm less sure of whether it makes sense to extend the trie value to 32 bits and bake Bidi_Class in the additional 8 bits.
/// | ||
|
||
|
||
/// 20..0 Code point return value for Bidi_Mirroring_Glyph value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these docs should go on the ULE type
Also, this module should probably be a submodule of provider
and marked unstable the way we usually do provider code, but not doc(hidden)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also these ranges seem backwards?
// | ||
// TODO: is the above statement correct and okay? | ||
// TODO: or do we have a way to validate the enum discriminant against the enum for | ||
// newtype idiom enums? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: we must validate that all unused bytes are 0 (or some other fixed value)
Co-authored-by: Manish Goregaokar <[email protected]>
provider/testdata/src/lib.rs
Outdated
@@ -92,7 +92,7 @@ pub mod versions { | |||
/// # Examples | |||
/// | |||
/// ``` | |||
/// assert_eq!("icu4x/2023-02-24/72.x", icu_testdata::versions::icu_tag()); | |||
/// assert_eq!("icu4x/2023-03-22a/72.x", icu_testdata::versions::icu_tag()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I'd prefer for the tag change to be its own standalone PR (I've given similar feedback to others in the past)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just remembered: in order for ci-job-testdata
to pass, the tag needs to be updated because it refers to the data from the latest tag when running cargo make testdata
. So if we want the latest version tag to be changed in a separate PR, then merging this PR will require overriding the fact that the relevant CI check will be failing (and that the follow on CI check full-datagen
won't run as a result, too).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can include the tag update in this PR but also pull it out, merge the other PR, and then update this PR to the latest main, which should be a small change. But I'll leave it up to you if you want to do this. It's a preference of mine but not super strong since this is only an additive change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
…ired_Bracket_Type in MirroredPairedBracketData
CheckedBidiPairedBracketType::Open => BidiPairingProperties::Open(mirroring_glyph), | ||
CheckedBidiPairedBracketType::Close => BidiPairingProperties::Close(mirroring_glyph), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the place where you consume the CheckedBidiPairedBracketType
, right? I don't understand the advantage of using the closed enum instead of using the open enum throughout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't use the open enum throughout: it has states that cannot be represented in the ULE type because we don't have space.
If we want to use the open enum, MirroredPairedBracketData
needs to have private fields, with a custom validating serde impl (and a custom ctor for databake).
|
||
use icu_provider::prelude::*; | ||
|
||
/// A wrapper around certain Bidi properties data. Can be obtained via [`load_bidi_auxiliary_properties_unstable()`] and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought (nb): Now that we have all these nice wrapper types I kinda feel like the load methods should have gone on those types instead of being free functions.
Might be a thing to do for 2.0? What do y'all think?
components/properties/src/props.rs
Outdated
#[cfg_attr(feature = "datagen", databake(path = icu_properties))] | ||
#[allow(clippy::exhaustive_structs)] // newtype | ||
#[repr(transparent)] | ||
#[zerovec::make_ule(BidiPairedBracketTypeULE)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's necessary to be able to make CPTs for datagen. That said, we could cfg_attr(feature = "datagen)
this, or just use CPT<u8>
in datagen.
components/properties/src/props.rs
Outdated
pub struct BidiPairedBracketType(pub u8); | ||
|
||
#[allow(non_upper_case_globals)] | ||
impl BidiPairedBracketType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought: hmm, so I was under the impression we already had this type when I suggested making a closed and open.
If this is a new type, perhaps this can just be a closed enum for now (and we have a single type). @sffc thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is being discussed below
components/properties/src/error.rs
Outdated
@@ -31,6 +31,9 @@ pub enum PropertiesError { | |||
/// An unknown or unexpected property name was used for an API dealing with properties specified as strings at runtime | |||
#[displaydoc("Unexpected or unknown property name")] | |||
UnexpectedPropertyName, | |||
/// A `u32` serialized value of `MirroredPairedBracketData` did not encode either a valid Bidi_Mirroring_Glyph or a valid Bidi_Paired_Bracket_Type | |||
#[displaydoc("Invalid MirroredPairedBracketData serialized in int: {0}")] | |||
MirroredPairedBracketDataFromU32(u32), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: I think slightly more idiomatic in Rust would be for this to be its own little error struct named MirroredPairedBracketDataTryFromError
Fixes #2833