-
Notifications
You must be signed in to change notification settings - Fork 213
Adding LruDataCache and overlay examples #2914
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
48b9022
88470e7
dc397af
d633a2c
a24b19d
5f2f069
f6a0e10
398bdce
bfe97f4
47212b7
3358185
3d3b6e2
0481da9
9cd296b
6100e72
2ab3d68
4332ade
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -61,3 +61,169 @@ impl AdditiveIdentity { | |
} | ||
} | ||
``` | ||
|
||
## Caching Data Provider | ||
|
||
ICU4X has no internal caches because there is no one-size-fits-all solution. It is easy for clients to implement their own cache for ICU4X, and although this is not generally required or recommended, it may be beneficial when latency is of utmost importance and, for example, a less-efficient data provider such as JSON is being used. | ||
|
||
The following example illustrates an LRU cache on top of a BufferProvider that saves deserialized data payloads as type-erased objects and then checks for a cache hit before calling the inner provider. | ||
|
||
```rust | ||
use icu_provider::hello_world::HelloWorldFormatter; | ||
use icu_provider::prelude::*; | ||
use icu::locid::locale; | ||
use lru::LruCache; | ||
use std::borrow::{Borrow, Cow}; | ||
use std::convert::TryInto; | ||
use std::sync::Mutex; | ||
use yoke::trait_hack::YokeTraitHack; | ||
use yoke::Yokeable; | ||
use zerofrom::ZeroFrom; | ||
|
||
#[derive(Debug, PartialEq, Eq, Hash)] | ||
struct CacheKeyWrap(CacheKey<'static>); | ||
|
||
#[derive(Debug, PartialEq, Eq, Hash)] | ||
struct CacheKey<'a>(DataKey, Cow<'a, DataLocale>); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. suggestion: comment explanation on this section about how the borrow works here to make it possible to do non-cloning gets (probably pair CacheKey with the Borrow impl in code organization too) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah. I'm not proud of this code because it uses a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah I think we shouldn't explain it in depth but every line of code in an example is something the reader may focus on: I think we should move the Borrow impl and the borrowed key type below the main stuff with a one-liner comment that just says it's to avoid clones on lookup , so that the reader doesn't spend too much time trying to figure out how it fits in to the rest |
||
|
||
pub struct LruDataCache<P> { | ||
cache: Mutex<LruCache<CacheKeyWrap, AnyResponse>>, | ||
provider: P, | ||
} | ||
|
||
// This impl enables a borrowed DataLocale to be used during cache retrieval. | ||
impl<'a> Borrow<CacheKey<'a>> for lru::KeyRef<CacheKeyWrap> { | ||
fn borrow(&self) -> &CacheKey<'a> { | ||
&Borrow::<CacheKeyWrap>::borrow(self).0 | ||
} | ||
} | ||
|
||
impl<M, P> DataProvider<M> for LruDataCache<P> | ||
where | ||
M: KeyedDataMarker + 'static, | ||
M::Yokeable: ZeroFrom<'static, M::Yokeable>, | ||
M::Yokeable: icu_provider::MaybeSendSync, | ||
for<'a> YokeTraitHack<<M::Yokeable as Yokeable<'a>>::Output>: Clone, | ||
P: DataProvider<M>, | ||
{ | ||
fn load(&self, req: DataRequest) -> Result<DataResponse<M>, DataError> { | ||
{ | ||
// First lock: cache retrieval | ||
let mut cache = self.cache.lock().unwrap(); | ||
let borrowed_cache_key = CacheKey(M::KEY, Cow::Borrowed(req.locale)); | ||
if let Some(any_res) = cache.get(&borrowed_cache_key) { | ||
// Note: Cloning a DataPayload is usually cheap, and it is necessary in order to | ||
// convert the short-lived cache object into one we can return. | ||
return any_res.downcast_cloned(); | ||
} | ||
} | ||
// Release the lock to invoke the inner provider | ||
let response = self.provider.load(req)?; | ||
let owned_cache_key = CacheKeyWrap(CacheKey(M::KEY, Cow::Owned(req.locale.clone()))); | ||
// Second lock: cache storage | ||
self.cache.lock() | ||
.unwrap() | ||
.get_or_insert(owned_cache_key, || response.wrap_into_any_response()) | ||
.downcast_cloned() | ||
} | ||
} | ||
|
||
// Usage example: | ||
let provider = icu_testdata::buffer(); | ||
let lru_capacity = 100usize.try_into().unwrap(); | ||
let provider = LruDataCache { | ||
cache: Mutex::new(LruCache::new(lru_capacity)), | ||
provider: provider.as_deserializing(), | ||
}; | ||
|
||
// The cache starts empty: | ||
assert_eq!(provider.cache.lock().unwrap().len(), 0); | ||
|
||
assert_eq!( | ||
"こんにちは世界", | ||
// Note: It is necessary to use `try_new_unstable` with LruDataCache. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. thought: this seems weird from client perspective. If we provided a cache we could actually make it implement There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would love to hear how you think we could implement a cache implementing There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. With the registry we can do it, no? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Exactly; using the registry is the only way one could do it. Maybe we don't align on whether the registry is a good solution; I think it's a solution that is generally quite bad and should be avoided if there is another viable option. In this case, we can put the cache into userland to circumvent the need for the registry. |
||
HelloWorldFormatter::try_new_unstable( | ||
&provider, | ||
&locale!("ja").into() | ||
) | ||
.unwrap() | ||
.format_to_string() | ||
); | ||
|
||
// One item in the cache: | ||
assert_eq!(provider.cache.lock().unwrap().len(), 1); | ||
|
||
assert_eq!( | ||
"ওহে বিশ্ব", | ||
HelloWorldFormatter::try_new_unstable( | ||
&provider, | ||
&locale!("bn").into() | ||
) | ||
.unwrap() | ||
.format_to_string() | ||
); | ||
|
||
// Two items in the cache: | ||
assert_eq!(provider.cache.lock().unwrap().len(), 2); | ||
|
||
assert_eq!( | ||
"こんにちは世界", | ||
HelloWorldFormatter::try_new_unstable( | ||
&provider, | ||
&locale!("ja").into() | ||
) | ||
.unwrap() | ||
.format_to_string() | ||
); | ||
|
||
// Still only two items in the cache, since we re-requested "ja" data: | ||
assert_eq!(provider.cache.lock().unwrap().len(), 2); | ||
``` | ||
|
||
## Overwriting Specific Data Items | ||
robertbastian marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
ICU4X's explicit data pipeline allows for specific data entries to be overwritten in order to customize the output or comply with policy. | ||
|
||
The following example illustrates how to overwrite the decimal separators for a region. | ||
|
||
```rust | ||
use icu::decimal::FixedDecimalFormatter; | ||
use icu_provider::prelude::*; | ||
use icu::locid::locale; | ||
use icu::locid::subtags_region as region; | ||
use std::borrow::Cow; | ||
use tinystr::tinystr; | ||
|
||
pub struct CustomDecimalSymbolsProvider<P>(P); | ||
|
||
impl<P> AnyProvider for CustomDecimalSymbolsProvider<P> | ||
where | ||
P: AnyProvider | ||
{ | ||
fn load_any(&self, key: DataKey, req: DataRequest) -> Result<AnyResponse, DataError> { | ||
use icu::decimal::provider::DecimalSymbolsV1Marker; | ||
let mut any_res = self.0.load_any(key, req)?; | ||
if key == DecimalSymbolsV1Marker::KEY && req.locale.region() == Some(region!("CH")) { | ||
let mut res: DataResponse<DecimalSymbolsV1Marker> = any_res.downcast()?; | ||
if let Some(payload) = &mut res.payload.as_mut() { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was wondering what the error case here is, but there doesn't seem to be one. The design decision to make payload an We should consider making the payload non-optional for 2.0 if there's still no use case for an absent payload by then. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
payload.with_mut(|data| { | ||
// Change the grouping separator for all Swiss locales to '🐮' | ||
data.grouping_separator = Cow::Borrowed("🐮"); | ||
}); | ||
} | ||
any_res = res.wrap_into_any_response(); | ||
} | ||
Ok(any_res) | ||
} | ||
} | ||
|
||
let provider = CustomDecimalSymbolsProvider(icu_testdata::any()); | ||
let formatter = FixedDecimalFormatter::try_new_with_any_provider( | ||
&provider, | ||
&locale!("de-CH").into(), | ||
Default::default(), | ||
) | ||
.unwrap(); | ||
|
||
assert_eq!(formatter.format_to_string(&100007i64.into()), "100🐮007"); | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: kinda want this to be a separate page because it's so complicated
though perhaps that can be kept in mind for mdbookifying?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed #2929 for mdbook, and we can track this there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't exactly sure the best place to put this: it's a bit low-level/detailed for a tutorial, but it's also too big/off-topic for a docs page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah i think in the mdbook world we can make this a subchapter