You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/data-pipeline.md
+15-1Lines changed: 15 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -12,7 +12,7 @@ The following terms are used throughout this document.
12
12
-**Format Version:** A version of the file format, abstracted away from the schema version and data version. For example, Protobuf 2 and Protobuf 3 are format versions.
13
13
-**Format:** How the data is stored on disk or provided from a service. Examples of data formats: JSON, YAML, Memory-Mapped, Protobuf. The format is internal to the data provider.
14
14
-**Hunk:** A small piece of locale data relating to a specific task. For example, the name of January might be a hunk, and a list of all month names could also be considered a hunk. A data piece is expected to reflect a specific type.
15
-
-**LangID:** A tuple of language, script, and region. LangID is a request variable. Subtags should be handled by the ICU4X code rather than the data provider.
15
+
-**LangID:** A tuple of language, script, and region. LangID is a request variable. Unicode Locale Extensions should be handled by the ICU4X code rather than the data provider.
16
16
-**Key:** An identifier corresponding to a specific hunk.
17
17
-**Mapping:** A mechanism that a data provider should follow to read from the schema and serve a hunk that may have a different type.
18
18
-**Request Variables:** Metadata that is sent along with a key when requesting data from a data provider.
@@ -64,10 +64,20 @@ A key is an integer from an enumeration. Each key has a corresponding type, whi
| CURR_LOCAL_SYM_V1 | 0x3002 | string | The symbol for that currency |
66
66
67
+
*Note:* Above, `i8` and `i32` signify an 8-bit or 32-bit signed integer. The exact types might differ based on the host language.
68
+
69
+
*Note:* The keys above are for illustrative purposes only. The actual data hunks will likely be on the larger side, such as "all number symbols for this locale and numbering system".
70
+
67
71
*Open Question:* How do you map from an enum/integer to a type in a type-safe way in Rust? In C++/Java, this would entail some sort of cast, which I imagine is possible in Rust but might require an unsafe block. Main issue: [#8](https://github.com/unicode-org/omnicu/issues/8)
68
72
69
73
*Open Question:* Due to ongoing developments in [wrapper-layer.md](wrapper-layer.md), the above list of example keys may be more fine-grained than we will need in the final product. It may be better to have more coarse-grained hunks, like "all decimal format symbols" instead of "grouping separator" and "decimal separator". Main issue: [#26](https://github.com/unicode-org/omnicu/issues/26)
70
74
75
+
### Data Key Struct Definitions
76
+
77
+
The actual data keys and the structs to which they correspond should be defined in a central location in the repository: [components/data-provider/src](https://github.com/unicode-org/icu4x/tree/master/components/data-provider/src). Follow conventions of existing data provider struct definitions when adding a new one.
78
+
79
+
There should generally be a 1-to-1 relationship between components (number formatter, plural rules, date format) and modules in the data provider crate. However, this is not strictly enforced; use your best judgement.
80
+
71
81
### Request Variables
72
82
73
83
Requests made to data providers consist of a key and additional *request variables*. The variables are:
@@ -151,6 +161,8 @@ Along with the hunk, the data provider sends multiple *response variables*. The
151
161
152
162
The supported LangID is expected to be the most specific LangID that had any data whatsoever, even if it is just an alias. For example, if en_GB is present but empty, and the data is actually loaded from en_001, the Supported LangID should still be en_GB. In other words, the fallback mechanism is considered an internal detail.
153
163
164
+
If the data provider is unable to return data based on a certain request, it may return an error in a form corresponding to the host language's convention.
165
+
154
166
### Data Version
155
167
156
168
The data version is expected to be a well-defined, namespaced identifier for the origin of the data. For example, when represented as a string, the following might be data versions:
@@ -160,3 +172,5 @@ The data version is expected to be a well-defined, namespaced identifier for the
160
172
-`FOO_1_1` → Version 1.1 of data from a hypothetical data source named Foo
161
173
162
174
The first data version subtag, or namespace, defines the syntax for the remainder of the identifier. For example, the `CLDR` namespace might accept two or three subtags: major version (`37`), minor version (`alpha1`), and optional patch version (`goog2020a`).
175
+
176
+
*Note:* The syntax for the data version is undefined at this time. What is shown above is merely a strawman example.
0 commit comments