You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have three examples that have come up that involve us needing to store the same functional data in different forms tailored to individual components and client needs:
Likely subtags: full set or only those needed for fallback
In all three of these cases, there is a component A that needs small data and a component B that needs bigger data; we want component A to use its small data if it is by itself, but if component B is present in the bundle, component A should use component B's data.
The best solution is to engineer the data structs to be fully orthogonal: bigger components load the data needed for the smaller components, plus some other "supplement" key. This is what @hsivonen has done in Collator/Normalizer. However, this is not always feasible if (1) the data cannot be easily split into smaller keys or (2) doing this split significantly reduces runtime performance.
For segmentation, I've proposed in #2905 that we do some magic inside datagen. However, this is not a foolproof solution since it requires datagen flags to be kept in sync with the ground truth in code.
For properties, there's discussion in #2833 about how to store the bidi-related properties for two distinct users, unicode_bidi and Harfbuzz.
For properties, there's discussion in #2833 about how to store the bidi-related properties for two distinct users, unicode_bidi and Harfbuzz.
FWIW I think the properties needed by unicode_bidi and harfbuzz are actually disjoint. unicode_bidi needs Bidi_Paired_Bracket and Bidi_Paired_Bracket_Type and Bidi_Class, whereas harfbuzz needs Bidi_Mirrored and Bidi_Mirroring_Glyph. If harfbuzz wishes to run the bidi algorithm it also needs the others, of course.
There's a potential optimization that can be done to merge them (#2833 (comment)) but I'm not convinced it's a good idea.
Overall I think this is where datagen config comes in, where we can tell datagen what subset we want to support.
Consensus: by default, do not accept overlapping data. Try to avoid it when possible. There may be exceptions, which can be approved by the SC on a case-by-case basis.
Uh oh!
There was an error while loading. Please reload this page.
We have three examples that have come up that involve us needing to store the same functional data in different forms tailored to individual components and client needs:
In all three of these cases, there is a component A that needs small data and a component B that needs bigger data; we want component A to use its small data if it is by itself, but if component B is present in the bundle, component A should use component B's data.
The best solution is to engineer the data structs to be fully orthogonal: bigger components load the data needed for the smaller components, plus some other "supplement" key. This is what @hsivonen has done in Collator/Normalizer. However, this is not always feasible if (1) the data cannot be easily split into smaller keys or (2) doing this split significantly reduces runtime performance.
For segmentation, I've proposed in #2905 that we do some magic inside datagen. However, this is not a foolproof solution since it requires datagen flags to be kept in sync with the ground truth in code.
For properties, there's discussion in #2833 about how to store the bidi-related properties for two distinct users, unicode_bidi and Harfbuzz.
Likely subtags: #2903
Let's discuss this general problem space and establish some recommendations.
@Manishearth @robertbastian @markusicu
The text was updated successfully, but these errors were encountered: