Skip to content

Adding LruDataCache and overlay examples #2914

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Jan 5, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 22 additions & 3 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

166 changes: 166 additions & 0 deletions docs/tutorials/data_provider.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,3 +61,169 @@ impl AdditiveIdentity {
}
}
```

## Caching Data Provider
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: kinda want this to be a separate page because it's so complicated

though perhaps that can be kept in mind for mdbookifying?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed #2929 for mdbook, and we can track this there

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't exactly sure the best place to put this: it's a bit low-level/detailed for a tutorial, but it's also too big/off-topic for a docs page.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i think in the mdbook world we can make this a subchapter


ICU4X has no internal caches because there is no one-size-fits-all solution. It is easy for clients to implement their own cache for ICU4X, and although this is not generally required or recommended, it may be beneficial when latency is of utmost importance and, for example, a less-efficient data provider such as JSON is being used.

The following example illustrates an LRU cache on top of a BufferProvider that saves deserialized data payloads as type-erased objects and then checks for a cache hit before calling the inner provider.

```rust
use icu_provider::hello_world::HelloWorldFormatter;
use icu_provider::prelude::*;
use icu::locid::locale;
use lru::LruCache;
use std::borrow::{Borrow, Cow};
use std::convert::TryInto;
use std::sync::Mutex;
use yoke::trait_hack::YokeTraitHack;
use yoke::Yokeable;
use zerofrom::ZeroFrom;

#[derive(Debug, PartialEq, Eq, Hash)]
struct CacheKeyWrap(CacheKey<'static>);

#[derive(Debug, PartialEq, Eq, Hash)]
struct CacheKey<'a>(DataKey, Cow<'a, DataLocale>);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: comment explanation on this section about how the borrow works here to make it possible to do non-cloning gets (probably pair CacheKey with the Borrow impl in code organization too)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I'm not proud of this code because it uses a #[doc(hidden)] export, lru::KeyRef. It's confusing and distracting to have it in the example. I originally skipped this and just cloned the key for lookup, but @robertbastian's review feedback was that my code was cloning too much. I can labor over explaining how the key borrowing works, but it's not the point of this example.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think we shouldn't explain it in depth but every line of code in an example is something the reader may focus on: I think we should move the Borrow impl and the borrowed key type below the main stuff with a one-liner comment that just says it's to avoid clones on lookup , so that the reader doesn't spend too much time trying to figure out how it fits in to the rest


pub struct LruDataCache<P> {
cache: Mutex<LruCache<CacheKeyWrap, AnyResponse>>,
provider: P,
}

// This impl enables a borrowed DataLocale to be used during cache retrieval.
impl<'a> Borrow<CacheKey<'a>> for lru::KeyRef<CacheKeyWrap> {
fn borrow(&self) -> &CacheKey<'a> {
&Borrow::<CacheKeyWrap>::borrow(self).0
}
}

impl<M, P> DataProvider<M> for LruDataCache<P>
where
M: KeyedDataMarker + 'static,
M::Yokeable: ZeroFrom<'static, M::Yokeable>,
M::Yokeable: icu_provider::MaybeSendSync,
for<'a> YokeTraitHack<<M::Yokeable as Yokeable<'a>>::Output>: Clone,
P: DataProvider<M>,
{
fn load(&self, req: DataRequest) -> Result<DataResponse<M>, DataError> {
{
// First lock: cache retrieval
let mut cache = self.cache.lock().unwrap();
let borrowed_cache_key = CacheKey(M::KEY, Cow::Borrowed(req.locale));
if let Some(any_res) = cache.get(&borrowed_cache_key) {
// Note: Cloning a DataPayload is usually cheap, and it is necessary in order to
// convert the short-lived cache object into one we can return.
return any_res.downcast_cloned();
}
}
// Release the lock to invoke the inner provider
let response = self.provider.load(req)?;
let owned_cache_key = CacheKeyWrap(CacheKey(M::KEY, Cow::Owned(req.locale.clone())));
// Second lock: cache storage
self.cache.lock()
.unwrap()
.get_or_insert(owned_cache_key, || response.wrap_into_any_response())
.downcast_cloned()
}
}

// Usage example:
let provider = icu_testdata::buffer();
let lru_capacity = 100usize.try_into().unwrap();
let provider = LruDataCache {
cache: Mutex::new(LruCache::new(lru_capacity)),
provider: provider.as_deserializing(),
};

// The cache starts empty:
assert_eq!(provider.cache.lock().unwrap().len(), 0);

assert_eq!(
"こんにちは世界",
// Note: It is necessary to use `try_new_unstable` with LruDataCache.
Copy link
Member

@robertbastian robertbastian Dec 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: this seems weird from client perspective. If we provided a cache we could actually make it implement AnyProvider

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would love to hear how you think we could implement a cache implementing AnyProvider on top of an inner provider that implements BufferProvider.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the registry we can do it, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly; using the registry is the only way one could do it. Maybe we don't align on whether the registry is a good solution; I think it's a solution that is generally quite bad and should be avoided if there is another viable option. In this case, we can put the cache into userland to circumvent the need for the registry.

HelloWorldFormatter::try_new_unstable(
&provider,
&locale!("ja").into()
)
.unwrap()
.format_to_string()
);

// One item in the cache:
assert_eq!(provider.cache.lock().unwrap().len(), 1);

assert_eq!(
"ওহে বিশ্ব",
HelloWorldFormatter::try_new_unstable(
&provider,
&locale!("bn").into()
)
.unwrap()
.format_to_string()
);

// Two items in the cache:
assert_eq!(provider.cache.lock().unwrap().len(), 2);

assert_eq!(
"こんにちは世界",
HelloWorldFormatter::try_new_unstable(
&provider,
&locale!("ja").into()
)
.unwrap()
.format_to_string()
);

// Still only two items in the cache, since we re-requested "ja" data:
assert_eq!(provider.cache.lock().unwrap().len(), 2);
```

## Overwriting Specific Data Items

ICU4X's explicit data pipeline allows for specific data entries to be overwritten in order to customize the output or comply with policy.

The following example illustrates how to overwrite the decimal separators for a region.

```rust
use icu::decimal::FixedDecimalFormatter;
use icu_provider::prelude::*;
use icu::locid::locale;
use icu::locid::subtags_region as region;
use std::borrow::Cow;
use tinystr::tinystr;

pub struct CustomDecimalSymbolsProvider<P>(P);

impl<P> AnyProvider for CustomDecimalSymbolsProvider<P>
where
P: AnyProvider
{
fn load_any(&self, key: DataKey, req: DataRequest) -> Result<AnyResponse, DataError> {
use icu::decimal::provider::DecimalSymbolsV1Marker;
let mut any_res = self.0.load_any(key, req)?;
if key == DecimalSymbolsV1Marker::KEY && req.locale.region() == Some(region!("CH")) {
let mut res: DataResponse<DecimalSymbolsV1Marker> = any_res.downcast()?;
if let Some(payload) = &mut res.payload.as_mut() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering what the error case here is, but there doesn't seem to be one. The design decision to make payload an Option<DataPayload> doesn't seem to have paid off, it's never None. Maybe unwrap here so readers don't have to figure this out themselves.

We should consider making the payload non-optional for 2.0 if there's still no use case for an absent payload by then.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

payload.with_mut(|data| {
// Change the grouping separator for all Swiss locales to '🐮'
data.grouping_separator = Cow::Borrowed("🐮");
});
}
any_res = res.wrap_into_any_response();
}
Ok(any_res)
}
}

let provider = CustomDecimalSymbolsProvider(icu_testdata::any());
let formatter = FixedDecimalFormatter::try_new_with_any_provider(
&provider,
&locale!("de-CH").into(),
Default::default(),
)
.unwrap();

assert_eq!(formatter.format_to_string(&100007i64.into()), "100🐮007");
```
3 changes: 2 additions & 1 deletion experimental/tutorials/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ icu = { version = "1.0.0", path = "../../components/icu", default-features = fal
icu_provider = { version = "1.0.0", path = "../../provider/core", default-features = false }
icu_provider_fs = { version = "1.0.0", path = "../../provider/fs" }
icu_provider_blob = { version = "1.0.0", path = "../../provider/blob" }
icu_testdata = { version = "1.0.0", path = "../../provider/testdata" }
icu_testdata = { version = "1.0.0", path = "../../provider/testdata", features = ["buffer"] }
zerofrom = { version = "0.1.0", path = "../../utils/zerofrom" }
serde = { version = "1.0", features = ["derive", "alloc", "std"] }
icu_datagen = { version = "1.0.0", path = "../../provider/datagen" }
Expand All @@ -47,3 +47,4 @@ databake = { version = "0.1.0", path = "../../utils/databake", features = ["deri
serde-aux = "2.1.1"
itertools = "0.10"
embed-doc-image = "0.1"
lru = "0.8.1"
48 changes: 47 additions & 1 deletion provider/core/src/any.rs
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,19 @@ impl AnyPayload {
}
}

/// Clones and then transforms a type-erased `AnyPayload` into a concrete `DataPayload<M>`.
pub fn downcast_cloned<M>(&self) -> Result<DataPayload<M>, DataError>
where
M: DataMarker + 'static,
// For the StructRef case:
M::Yokeable: ZeroFrom<'static, M::Yokeable>,
// For the PayloadRc case:
M::Yokeable: MaybeSendSync,
for<'a> YokeTraitHack<<M::Yokeable as Yokeable<'a>>::Output>: Clone,
{
self.clone().downcast()
}

/// Creates an `AnyPayload` from a static reference to a data struct.
///
/// # Examples
Expand Down Expand Up @@ -230,7 +243,7 @@ impl From<AnyResponse> for DataResponse<AnyMarker> {
}

impl AnyResponse {
/// Transforms a type-erased `DataResponse<AnyMarker>` into a concrete `DataResponse<M>`.
/// Transforms a type-erased `AnyResponse` into a concrete `DataResponse<M>`.
#[inline]
pub fn downcast<M>(self) -> Result<DataResponse<M>, DataError>
where
Expand All @@ -244,6 +257,39 @@ impl AnyResponse {
payload: self.payload.map(|p| p.downcast()).transpose()?,
})
}

/// Clones and then transforms a type-erased `AnyResponse` into a concrete `DataResponse<M>`.
pub fn downcast_cloned<M>(&self) -> Result<DataResponse<M>, DataError>
where
M: DataMarker + 'static,
M::Yokeable: ZeroFrom<'static, M::Yokeable>,
M::Yokeable: MaybeSendSync,
for<'a> YokeTraitHack<<M::Yokeable as Yokeable<'a>>::Output>: Clone,
{
Ok(DataResponse {
metadata: self.metadata.clone(),
payload: self
.payload
.as_ref()
.map(|p| p.downcast_cloned())
.transpose()?,
})
}
}

impl<M> DataResponse<M>
where
M: DataMarker + 'static,
M::Yokeable: MaybeSendSync,
{
/// Moves the inner DataPayload to the heap (requiring an allocation) and returns it as an
/// erased `AnyResponse`.
pub fn wrap_into_any_response(self) -> AnyResponse {
AnyResponse {
metadata: self.metadata,
payload: self.payload.map(|p| p.wrap_into_any_payload()),
}
}
}

/// An object-safe data provider that returns data structs cast to `dyn Any` trait objects.
Expand Down