Skip to content

Commit 9de1568

Browse files
authored
Merge pull request #145 from sffc/issue10
Fixes for the data provider doc
2 parents adeae53 + 41d3485 commit 9de1568

File tree

1 file changed

+15
-1
lines changed

1 file changed

+15
-1
lines changed

docs/data-pipeline.md

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ The following terms are used throughout this document.
1212
- **Format Version:** A version of the file format, abstracted away from the schema version and data version. For example, Protobuf 2 and Protobuf 3 are format versions.
1313
- **Format:** How the data is stored on disk or provided from a service. Examples of data formats: JSON, YAML, Memory-Mapped, Protobuf. The format is internal to the data provider.
1414
- **Hunk:** A small piece of locale data relating to a specific task. For example, the name of January might be a hunk, and a list of all month names could also be considered a hunk. A data piece is expected to reflect a specific type.
15-
- **LangID:** A tuple of language, script, and region. LangID is a request variable. Subtags should be handled by the ICU4X code rather than the data provider.
15+
- **LangID:** A tuple of language, script, and region. LangID is a request variable. Unicode Locale Extensions should be handled by the ICU4X code rather than the data provider.
1616
- **Key:** An identifier corresponding to a specific hunk.
1717
- **Mapping:** A mechanism that a data provider should follow to read from the schema and serve a hunk that may have a different type.
1818
- **Request Variables:** Metadata that is sent along with a key when requesting data from a data provider.
@@ -64,10 +64,20 @@ A key is an integer from an enumeration. Each key has a corresponding type, whi
6464
| CURR_LOCAL_CODE_V1 | 0x3001 | string | The locale's currency code |
6565
| CURR_LOCAL_SYM_V1 | 0x3002 | string | The symbol for that currency |
6666

67+
*Note:* Above, `i8` and `i32` signify an 8-bit or 32-bit signed integer. The exact types might differ based on the host language.
68+
69+
*Note:* The keys above are for illustrative purposes only. The actual data hunks will likely be on the larger side, such as "all number symbols for this locale and numbering system".
70+
6771
*Open Question:* How do you map from an enum/integer to a type in a type-safe way in Rust? In C++/Java, this would entail some sort of cast, which I imagine is possible in Rust but might require an unsafe block. Main issue: [#8](https://github.com/unicode-org/omnicu/issues/8)
6872

6973
*Open Question:* Due to ongoing developments in [wrapper-layer.md](wrapper-layer.md), the above list of example keys may be more fine-grained than we will need in the final product. It may be better to have more coarse-grained hunks, like "all decimal format symbols" instead of "grouping separator" and "decimal separator". Main issue: [#26](https://github.com/unicode-org/omnicu/issues/26)
7074

75+
### Data Key Struct Definitions
76+
77+
The actual data keys and the structs to which they correspond should be defined in a central location in the repository: [components/data-provider/src](https://github.com/unicode-org/icu4x/tree/master/components/data-provider/src). Follow conventions of existing data provider struct definitions when adding a new one.
78+
79+
There should generally be a 1-to-1 relationship between components (number formatter, plural rules, date format) and modules in the data provider crate. However, this is not strictly enforced; use your best judgement.
80+
7181
### Request Variables
7282

7383
Requests made to data providers consist of a key and additional *request variables*. The variables are:
@@ -151,6 +161,8 @@ Along with the hunk, the data provider sends multiple *response variables*. The
151161

152162
The supported LangID is expected to be the most specific LangID that had any data whatsoever, even if it is just an alias. For example, if en_GB is present but empty, and the data is actually loaded from en_001, the Supported LangID should still be en_GB. In other words, the fallback mechanism is considered an internal detail.
153163

164+
If the data provider is unable to return data based on a certain request, it may return an error in a form corresponding to the host language's convention.
165+
154166
### Data Version
155167

156168
The data version is expected to be a well-defined, namespaced identifier for the origin of the data. For example, when represented as a string, the following might be data versions:
@@ -160,3 +172,5 @@ The data version is expected to be a well-defined, namespaced identifier for the
160172
- `FOO_1_1` → Version 1.1 of data from a hypothetical data source named Foo
161173

162174
The first data version subtag, or namespace, defines the syntax for the remainder of the identifier. For example, the `CLDR` namespace might accept two or three subtags: major version (`37`), minor version (`alpha1`), and optional patch version (`goog2020a`).
175+
176+
*Note:* The syntax for the data version is undefined at this time. What is shown above is merely a strawman example.

0 commit comments

Comments
 (0)