Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apollo Store lock contention #4605

Open
ynkm169 opened this issue Jan 5, 2023 · 14 comments
Open

Apollo Store lock contention #4605

ynkm169 opened this issue Jan 5, 2023 · 14 comments

Comments

@ynkm169
Copy link

ynkm169 commented Jan 5, 2023

Use case

Hello,

Did some systracing. Launched 50 duplicate API calls, result shows there are around 60 RxCachedThreadS and 60 DefaultDispatch threads all contending for monitor lock.

This is only theoretical tests. However, our app at peak (home feed) can have 200+ threads running. Even if there will be only dozen of remote API calls (not counting other Apollo store cache reads), if we migrate all APIs to GraphQL, this could present a performance penalty.

Also in RXJava world, it is not uncommon for dev to call an API to retrieve a list of items, and use flatmap to make a seperate API call for each item in the list, so 50 API calls is not so uncommon and it gets worse as user scroll quickly on a list.

Describe the solution you'd like

Question
Is it possible to optimize the locking a bit more? Possibly even sacrificing consistency for performance (or having it as a configurable option)?
MemoryCache has a monitor lock: CacheLock. (This theoretically should be very fast since synchronized block guards just a few LinkedHashMap calls)
DefaultApolloStore has a ReentrantReadWriteLock. (A single writer will block all readers, a single reader will block all writers)


Java/Kotlin method tracing
DefaultDispatch: It seems the ReentrantReadWriteLock does a couple of things such as Invoking JsonWriter, not just directly save data to a LinkedHashMap.
Screen Shot 2023-01-04 at 10 48 21 PM

RxCachedThreadS: It's harder to understand the Flame Chart and why the thread exist.
Screen Shot 2023-01-04 at 10 52 07 PM


systrace
Screen Shot 2023-01-04 at 9 33 42 PM
Screen Shot 2023-01-04 at 9 32 07 PM
Screen Shot 2023-01-04 at 9 32 01 PM
Screen Shot 2023-01-04 at 9 31 55 PM
Screen Shot 2023-01-04 at 9 31 49 PM
Screen Shot 2023-01-04 at 9 31 44 PM
Screen Shot 2023-01-04 at 9 31 38 PM
Screen Shot 2023-01-04 at 9 31 29 PM

@ynkm169 ynkm169 changed the title Memory cache lock contention Apollo Store lock contention Jan 5, 2023
@BoD
Copy link
Contributor

BoD commented Jan 5, 2023

Thank you for providing this, this is interesting data, and will dig into it.

Just a quick reaction to

it is not uncommon for dev to call an API to retrieve a list of items, and use flatmap to make a seperate API call for each item in the list

Not sure if you are using GraphQL already (if yes please disregard my comment), but one of its advantage when compared to REST is the ability to avoid this, by having the fields you need at your disposal to be used in one query. Of course the ability to do it will depend on the server's schema, but in many cases you should be able to avoid some round trips.

@ynkm169
Copy link
Author

ynkm169 commented Jan 5, 2023

Not sure if you are using GraphQL already

Yes, to provide more context, REST and GraphQL will co-exist for quite a while for us. Migration will take a lot of time. It wasn't best argument :) since GraphQL was designed to reduce API calls.

@ynkm169
Copy link
Author

ynkm169 commented Jan 5, 2023

I wasn't sure if it is possible to build a LRU cache that's more performant than a synchronized LinkedHashMap. Looks like Guava cache is potentially more performant (hard to tell since Guava cache maybe better used for server with hundreds/thousands of threads than low power mobile devices.)

The Least Recently Used page replacement algorithm was chosen due to its simplicity.
The basic strategy is to subdivide the table among Segments, each of which itself is a
concurrently readable hash table. The map supports non-blocking reads and concurrent writes
across different segments.

This is Android LRU cache implementation.

Anyway, I think the CacheLock should be sufficient and not cause of the contention. But just some thoughts, so ignore this comment.

@BoD
Copy link
Contributor

BoD commented Jan 5, 2023

A few additional notes:

  • About your last comment about LRU, this is interesting. My hunch is that optimizing the LRU cache itself won't have a big impact (I could be wrong!) but that the locking situation showed with your traces could be improved
  • The need for the synchronized block in MemoryCache.loadRecord() is that, although it is already called under a read lock (from DefaultApolloStore.readOperation), the code actually alters the inner state, and we need to make sure that only one thread is doing that at a time. As you noted above, updating the LRU cache state itself should be a very fast operation however.

    But inside the same synchronized block the code also reads the data of the next cache. This is usually the disk cache and does IO, so not a fast operation. Locking this part is also unnecessary since we're already under the read lock. We could improve this by reading the disk cache outside of the synchronized block, and only do the memory changes under synchronized block. This would allow multiple threads to read from the disk cache at the same time, as intended, probably a good improvement to make.

@ynkm169
Copy link
Author

ynkm169 commented Jan 5, 2023

But inside the same synchronized block the code also reads the data of the next cache. This is usually the disk cache and does IO, so not a fast operation.

Great idea! What's the class name of the disk cache? If you are referring to http cache, we don't use it.

@BoD
Copy link
Contributor

BoD commented Jan 5, 2023

No sorry I was referring to the SQLite cache (SqlNormalizedCache). Maybe you don't need it either actually - it's only useful if you need a persistent cache in your app of course.

@ynkm169
Copy link
Author

ynkm169 commented Jan 5, 2023 via email

@BoD
Copy link
Contributor

BoD commented Jan 6, 2023

Thanks for clarifying. It's difficult to pinpoint the source of the contention. FWIW, what I observe in a similar test - albeit without Rx - looks a bit different.

I'm wondering if it would make sense to get rid of a high level lock on ApolloStore and let the NormalizedCaches handle their own locking instead.

Overall, we think the normalized cache could benefit from a revamp, including performance improvements (in this area but also in terms of storage size, access time, etc.), as well as new features (see #2331) - but this will be a major task.

@ynkm169
Copy link
Author

ynkm169 commented Jan 6, 2023

I'm wondering if it would make sense to get rid of a high level lock on ApolloStore and let the NormalizedCaches handle their own locking instead.

I am not familiar with internals of everything the lock currently guards.

Regarding the our own repository layer implementation for comparison with Apollo Store.
It basically concats 3 streams of data and optionally take the first that comes back. The result observable can be subscribed on IO thread. We basically do not hold any lock anywhere. Android LRU cache implementation is thread safe.
A: in memory data source with Android LRU cache implementation.
B: DB data source (very rare)
C: remote REST data source.

Let me know if there is any performance improvement regarding this in future, I could help do some systrace! Thanks.

@ynkm169
Copy link
Author

ynkm169 commented Jan 24, 2023

Hey, just sharing some ideas I found for improving performance of readers/writers
JakeW's DiskLruCache used by Okhttp and Bumptech/Glide image loading.

  • Multiple readers can read the cache even if there is a writer that is currently writing. Writer write to a temporary file, and swap with the main file when it's done. It sacrifice consistency for performance.
  • Does not allow multiple writers to the same file.

@ashare80
Copy link
Contributor

Running into performance issues with this with large queries.

Found that there was some extra logic inside of the locks that could be moved out: #5101

This was minor performance boost when testing these modifications on our data.

It appears on a query cache read I am trying to boost performance, a lot of the time is spent in JSON serialization logic trying to build out the records.

Screenshot 2023-07-18 at 1 50 58 AM

@martinbonnin
Copy link
Contributor

Thank you so much @ashare80 💙 . The investigation and PR are awesome! Is there any chance you could contribute your data? We could include that in benchmarks and that'd be a good data point for future optimisation. No worries at all if not, we can always forge tests but real life data always has more flavour :)

@ashare80
Copy link
Contributor

@martinbonnin We might be able to work something out for that. What would be the best way to share this? Would a sample response be sufficient?

@martinbonnin
Copy link
Contributor

Nice! If you can share the schema + query too, it'd be ideal. If not we can craft something out but it's a bit painful + might not be able to infer the polymorphism if any.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants