Skip to content

Adding LruDataCache and overlay examples #2914

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Jan 5, 2023
Merged

Conversation

sffc
Copy link
Member

@sffc sffc commented Dec 23, 2022

I don't expect a review on any PRs I send this week until January

Fixes #1246

@sffc sffc requested review from Manishearth and a team as code owners December 23, 2022 01:20
@sffc sffc requested a review from robertbastian December 23, 2022 01:20
Copy link
Member

@robertbastian robertbastian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a good example. It ignores all the zero-copying that we've put a lot of effort into, and even needs to clone the locale to do lookup. Let's restrict the example to wrapping an AnyProvider, and put more work into AnyProvider <-> BufferProvider interaction so that we don't allocate left and right.

@sffc
Copy link
Member Author

sffc commented Dec 23, 2022

I don't think this is a good example. It ignores all the zero-copying that we've put a lot of effort into, and even needs to clone the locale to do lookup. Let's restrict the example to wrapping an AnyProvider, and put more work into AnyProvider <-> BufferProvider interaction so that we don't allocate left and right.

I disagree on the following points:

  1. It definitely does not ignore zero-copy effort; the yoke is intact when stored and retrieved from the cache. Cache retrievals would not be cheap without the help of Yoke and ZeroVec.
  2. The DataLocale clone aside, there is a single additional allocation during storage (DataResponse to AnyResponse) and none during retrieval thanks to the Rc Cart. It very much mischaracterizes this example to say that we "allocate left and right".
  3. I updated the example to avoid cloning the DataLocale during retrieval; it adds a bit of complexity for readers, which I was trying to avoid.
  4. This is I think one of the best examples of how BufferProvider and AnyProvider can interoperate, and I chose a BufferProvider source specifically to make that point. As we've discussed before, there is nothing we can do to make BufferProvider and AnyProvider interoperate any better than this short of going through the registry. A cache that implements DataProvider can invoke both BufferProvider and AnyProvider, which is something worth demonstrating.

@sffc sffc requested a review from robertbastian December 23, 2022 13:45
ICU4X has no internal caches because there is no one-size-fits-all solution. It is easy for clients to implement their own cache for ICU4X, and although this is not generally required or recommended, it may be beneficial when latency is of utmost importance and:

1. A less-efficient data provider is being used, such as JSON
2. Lookups make heavy use of vertical fallback
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: I don't think we should suggest caching the data structs when fallback is the problem. In that case one should cache resolved locales

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The resolved locales could be different for each key. But yes, we could cache (DataKey, DataLocale) -> DataLocale.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the use case of vertical fallback since this is a good point and we could even consider offering this as an official solution if needed.

@@ -121,6 +145,7 @@ assert_eq!(provider.cache.lock().unwrap().len(), 0);

assert_eq!(
"こんにちは世界",
// Note: It is necessary to use `try_new_unstable` with LruDataCache.
Copy link
Member

@robertbastian robertbastian Dec 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: this seems weird from client perspective. If we provided a cache we could actually make it implement AnyProvider

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would love to hear how you think we could implement a cache implementing AnyProvider on top of an inner provider that implements BufferProvider.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the registry we can do it, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly; using the registry is the only way one could do it. Maybe we don't align on whether the registry is a good solution; I think it's a solution that is generally quite bad and should be avoided if there is another viable option. In this case, we can put the cache into userland to circumvent the need for the registry.

@sffc sffc requested a review from robertbastian December 23, 2022 15:39
@sffc
Copy link
Member Author

sffc commented Dec 23, 2022

Changed this PR to "Fixes #1246"

@sffc sffc changed the title Adding LruDataCache example Adding LruDataCache and overlay examples Dec 23, 2022
}
}
// Release the lock to invoke the inner provider
let response: DataResponse<M> = self.provider.load(req)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this type needed? self.provider is only DataProvider<M>, so there shouldn't be any ambiguity.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Primarily for clarity. The audience for this tutorial doesn't know what all these functions do or what they return. But if you think it's clear enough I'll remove it.

let mut res: DataResponse<DecimalSymbolsV1Marker> = any_res.downcast()?;
if let Some(payload) = &mut res.payload.as_mut() {
payload.with_mut(|data| {
// Change the decimal separators for all Swiss locales to '.' and ','
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You seem to be implying that our data needs to be fixed. Maybe replace them by something fun, like 🐮

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mooooo 🐮

@@ -61,3 +61,169 @@ impl AdditiveIdentity {
}
}
```

## Caching Data Provider
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: kinda want this to be a separate page because it's so complicated

though perhaps that can be kept in mind for mdbookifying?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed #2929 for mdbook, and we can track this there

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't exactly sure the best place to put this: it's a bit low-level/detailed for a tutorial, but it's also too big/off-topic for a docs page.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i think in the mdbook world we can make this a subchapter

struct CacheKeyWrap(CacheKey<'static>);

#[derive(Debug, PartialEq, Eq, Hash)]
struct CacheKey<'a>(DataKey, Cow<'a, DataLocale>);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: comment explanation on this section about how the borrow works here to make it possible to do non-cloning gets (probably pair CacheKey with the Borrow impl in code organization too)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I'm not proud of this code because it uses a #[doc(hidden)] export, lru::KeyRef. It's confusing and distracting to have it in the example. I originally skipped this and just cloned the key for lookup, but @robertbastian's review feedback was that my code was cloning too much. I can labor over explaining how the key borrowing works, but it's not the point of this example.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think we shouldn't explain it in depth but every line of code in an example is something the reader may focus on: I think we should move the Borrow impl and the borrowed key type below the main stuff with a one-liner comment that just says it's to avoid clones on lookup , so that the reader doesn't spend too much time trying to figure out how it fits in to the rest

let mut any_res = self.0.load_any(key, req)?;
if key == DecimalSymbolsV1Marker::KEY && req.locale.region() == Some(region!("CH")) {
let mut res: DataResponse<DecimalSymbolsV1Marker> = any_res.downcast()?;
if let Some(payload) = &mut res.payload.as_mut() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering what the error case here is, but there doesn't seem to be one. The design decision to make payload an Option<DataPayload> doesn't seem to have paid off, it's never None. Maybe unwrap here so readers don't have to figure this out themselves.

We should consider making the payload non-optional for 2.0 if there's still no use case for an absent payload by then.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sffc sffc merged commit da0b424 into unicode-org:main Jan 5, 2023
@sffc sffc deleted the docs-1.1 branch January 5, 2023 20:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DataProvider routing, lazy deserialization, caching, and overlays
3 participants