Skip to content
This repository has been archived by the owner on Jul 6, 2020. It is now read-only.

Offline and persistent cache support #61

Open
JoviDeCroock opened this issue Sep 5, 2019 · 19 comments
Open

Offline and persistent cache support #61

JoviDeCroock opened this issue Sep 5, 2019 · 19 comments
Labels
feature 🚀 New feature or request

Comments

@JoviDeCroock
Copy link

JoviDeCroock commented Sep 5, 2019

Offline

We all think about this in the modern PWA-era but there's a lot to this. We'll have to keep track of what requests the user needs to send when the connection is restored, after these requests are sent there will MOST likely be several optimistic entries to clear.

Operations

So for knowing what operations to cache it should be sufficient to only cache mutation operations. These will then be kept in a map<key, operation> and be persisted to some indexedDB/localStorage when we kill the application and they haven't been dispatched yet.

The hard part about this is that we would have to restore the optimisticKeys in the exchange, this makes me think about moving these to our instance of store instead. Since the serialisation of entities, links and optimisticKeys could then happen from one place. This brings as additional advantage that it can be done with one restore method.

One concern would be the read/write speed of killing/rebooting the cache in this state. The HAMT structure is quite hard to serialise taking in account that it will contain optimistic values mixed with normal ones.

Connection checking

This should be easily doable by means of navigator.online, we could buffer all requests until we come online and then send them in the correct order one by one to avoid concurrency problems. The difficult part here wold be that we buffer up until all operations are dispatched, this means that if the user performs another action while we are emptying the queue this could take a while to get a response (given we are using optimisticResponses though).

Ideally when we see we are offline we filter all queries, and just keep them incomplete. When we see we are going offline all subscriptions should receive an active teardown.

Exchange

When reasoning about this my thoughts always wonder to a separate exchange to manage the operation buffering and to incorporate the restoring/serialising inside the graphCache. This has a bit of an overlap but I think it's sufficient reason to keep them separate.

Persistance

Here I'm having issues seeing how we could effectively solve this, we have the schema now so we could potentially just iterate over the whole schema and write it that way but that won't cover the case where people just want persisted-cache without the whole schema effort.

What scares me the most about this is that localStorage isn't the ideal candidate for persisted cache but by using indexedDB we exclude about 5% of the browser population.
IndexedDB seems to ask for permission if a blob is >50MB on Firefox, that's about it no explicit size limitations for even a single data field.

The max size for localStorage is 10MB so I don't really think this is sufficient for big applications, since the initial cost of the data structure is also there. We could strip everything down but how do we rebuilt it then, maybe by bucket size?

This is a brain dump of what I've been thinking about and is by no means a final solution but I think this could serve as an entry to finding the solution to what feels like a really awesome feature.

Other relevant solution: https://github.com/redux-offline/redux-offline/tree/v1.1.0#persistence-is-key

This uses redux-persist under the hood that also relies on indexedDB under the hood. Since this is a reliable and widespread solution I think it's safe to resort to indexedDB and fallback to localStorage when needed.

For react-native we can easily resort to the AsyncStorage module. It seems that AsyncStorage isn't 100% safe either since on android this errors out when you exceed a 6MB write.

Introducing some way of leaving certain fields/queries out seems very mandatory to me since in the test described underneath we see that we're hitting the limits of localStorage pretty quickly.

Test

I did a small test with our current benchmarking where I serialised 50k entities and just wrote them to a JSON file to look at the size:

ENTITIES 14260659B 14.260659MB
Links 664618B 0.664618MB

This already exceeds the limits of localStorage and would cause a prompt in indexedDB asking for permissions saving this amount of data.

Code used:

const urqlStore = new Store();
write(urqlStore, { query: BooksQuery }, { books: tenThousandBooks });
write(
  urqlStore,
  { query: EmployeesQuery },
  { employees: tenThousandEmployees }
);
write(urqlStore, { query: StoresQuery }, { stores: tenThousandStores });
write(urqlStore, { query: WritersQuery }, { writers: tenThousandWriters });
write(urqlStore, { query: TodosQuery }, { todos: tenThousandEntries });

const entities = JSON.stringify(urqlStore.records);
const links = JSON.stringify(urqlStore.links);

fs.writeFileSync('./entities.json', entities);
fs.writeFileSync('./links.json', links);

const { size: entityFileSize } = fs.statSync('./entities.json');
const { size: linkFileSize } = fs.statSync('./links.json');
console.log('ENTITIES', entityFileSize, entityFileSize / 1000000.0)
console.log('Links', linkFileSize, linkFileSize / 1000000.0)

Wild thoughts

I've been thinking about maybe making a distinction between a storage.native and a storage file. This way we could leverage web workers and application cache to write our results at runtime instead of just when we close the application.

Requirements

To implement persistent data we would have to implement an adapter with an API surface for getting setting and deleting. People can in turn pass in every storage they would like, this way people who use something like PouchDB can write an adapter and just use that.

We should decide on an approach when to write, after every query? This would make us have to write after every optimistic write as well which makes everything a tad harder certainly since it's going to be hard to incrementally write changes from our HAMT structure. I think it's better to work with a hydrate and exit approach. This could make writes take up more time but in the end would require a whole lot less logic.

We would need an approach that can evict certain portions of the state from being cached, examples would be an exclude/include pattern. When we include something that will be the only thing being cached. When we exclude something all but that exclude will be cached. These should be mutually exclusive.

When not supplied with a schema how would we arrange for excluding data.

Drew up a diagram of how I expect this to happen, the code for the offline part was easy to write and is done.

Screenshot 2019-09-05 at 15 05 40

@zsolt-dev
Copy link

Thank you for working on this.

I think it would be good to allow everyone to use whatever persistent storage they want. For example, the https://github.com/apollographql/apollo-cache-persist allows you to select these:

  • AsyncStorage on React Native
  • window.localStorage on web
  • window.sessionStorage on web
  • localForage on web

or any custom storage, for example I use this to connect to the indexedDB:

import { get, set, keys, del, clear } from './idb-keyval';

export default {
  clear() {
    return clear();
  },
  getItem(key) {
    return get(key);
  },
  setItem(key, value) {
    return set(key, value);
  },
  keys() {
    return keys();
  },
  remove(key) {
    return del(key);
  },
  removeItem(key) {
    return del(key);
  },
};

@kitten kitten added the future 🔮 An enhancement or feature proposal that will be addressed after the next release label Sep 16, 2019
@wtrocki
Copy link

wtrocki commented Nov 8, 2019

Hi

I'm the maintainer of the Apollo-Cache-Persist and various offline libraries for GraphQL. I have been playing with the Urql-exchange-graphcache for a while and I absolutely love it.
I think on very simple layer persistence can be done today by utilizing a similar mechanism as cache-persist/redux persist. However, this will mean storing the entire cache as single key. Typical cache snapshot approach that is very inefficient and jitters rendering.
The alternative will be to deliver a persistence mechanism that will spread cache by keys and types - something that redux persist was doing.
Do you allow the community to deliver something simple for the moment and then drive better support later?

I think from the community side Apollo-Cache-Persist is the main reason why so many people use Apollo Client at the moment in their React-Native apps that tend to kill views when transitioning.

@wtrocki
Copy link

wtrocki commented Nov 8, 2019

I will also put some extra info for context after 2 years working with GraphQL cache:

  • Community and users will care a lot about sync/async api.
    The main problem in cache-persist was that sometimes actual storage interface was sync (localstorage) but api was still requireing users to await - which in turn made entire usage very complex - requires do build UI/loader when starting etc. Not awaiting can cause unexpected results and data loss when cache is not loaded fully.

  • Optimistic responses cannot be persisted due to the fact that it will require a significant amount of time to clean them after the application restarts. When restarting app operations usually need to be reapplied with new optimistic responses anyway. This forces cache to operate with an optimistic response layer on top of the regular cache. Persistence should ignore optimistic responses.

  • The persisting cache will have a lot of side effects on user experience with prefetching data.
    Different strategies will be invalidated - for example cache-first strategy will never fetch data again so developers will need to remember to use subscriptions or alternative way of updating their cache

  • It should be easy and clear to manage/wipe persistence together with the cache.
    For example when logging out from app new client can be created etc.
    This is what apollo-cache-persist is still bad at due to architecture challenges.

  • It should be easy to control what is cached by filtering (see redux persist filters).
    Most of the os have limits on the persistence layer so developers should be able to tell what their offline data is.

  • Different storage solutions should be enabled. This is very known and also resolved the problem by frameworks like dixe etc. There are so many issues related to different variations of storage behaving differently - Async Store from react core works different than AsyncStore from React community etc.

I think for the quick win we could adjust apollo-cache-persist implementation to work with the urql or create a separate package that will hook into cache write operations and will know how to restore it. I haven't really tried to do it to say how hard it will be.
What will be needed is just to provide a wrapper to cache write method like here:

https://github.com/apollographql/apollo-cache-persist/blob/master/src/onCacheWrite.ts

@JoviDeCroock
Copy link
Author

JoviDeCroock commented Nov 8, 2019

Hey @wtrocki

I'm super happy that people are interested in this issue, we encourage community exchanges and are happy to help out where possible.

  • Our exchanges allow for both async and sync operations so we could probably abstract this away by throttling operations until something has finished (users won't feel the implication)
  • Optimistic responses will be a hard thing to tackle I think in terms of persisting between sessions, I'm not aware how this is handled in apollo (will read into this)
  • Yes this is a common tradeoff and accepted
  • This should be possible but not entirely sure on the implementation
  • That's going to be a harder one since we mostly save what is received from the backend and don't add extra's to it.

I can look into that wrapper this weekend

@wtrocki
Copy link

wtrocki commented Nov 8, 2019

Our exchanges allow.

The way it will work is that there will be separate persistor available globally that will need to be awaited and then it will setup initial cache. If graphcache has ability to seed initial data then this is very trivial to implement

Optimistic responses will be a hard thing to tackle I

Absolutely not. They should be ignored completely. Most of the frameworks like offix or luna.js recreates them anyway. Trick is to have cache that do not apply optimistic responses to data.

See: https://github.com/wtrocki/apollo-client/blob/07a4f2c4b7cfe4c31ed41a393e5e0da317780661/packages/apollo-cache-inmemory/src/inMemoryCache.ts

It has two separate fields:

  • data which is server side cache
  • optimistic response - which is holding transactions with optimistic respones

Restart is kinda tricky as there is no way to restore promise chain that usually removes optimnistic responses so best to not store them.
Do you have such counterpart in graphbache?

This should be possible but not entirely sure on the implementation

This should be possible by hooking into client.destroy() method. Since cache is an separate exchange to client it is best to hook into client lifecycle. (but not sure about that

That's going to be a harder one since we mostly save what is received from the backend and don't add extra's to it.

IMHO this will be trivial.

Let's collaborate on this If there is sample app for url that has cache and if we can simply add console.log everytime backend payload gets saved then integration should be trivial and we can donate tons of code from cache-persist that will work here.

My main question will be to see if save should be persisting entire cache every time or it should be connected to individual server responses (which comes with tricky normalization challenge)

@JoviDeCroock
Copy link
Author

Trick is to have cache that do not apply optimistic responses to data.

Optimistic responses are layered on top of data so that in essence is no issue, my reasoning behind keeping optimism around is that we want to restore the data AND be able to dispatch the request when the user gets online. Maybe I was putting too much eggs in one basket though.

Let's collaborate on this

Definitley, https://github.com/JoviDeCroock/threed-web is an app we can use to test it, linearly we have an API for that https://github.com/kitten/threed-example-api

We can use that to test on, this has all things of graphCache implemented (optimism, ...)

@wtrocki
Copy link

wtrocki commented Nov 8, 2019

and be able to dispatch the request when the user gets online

See https://offix.dev . This is the exact use case of this library. However it is way too much responsibility for an cache persistence.

Definitley, https://github.com/JoviDeCroock/threed-web is an app we can use to test it, linearly we have an API for that https://github.com/kitten/threed-example-api

Perfect. Going to check this and provide update in this PR.

@wtrocki
Copy link

wtrocki commented Nov 8, 2019

So coming back with the plan. I think that having some extra interface passed to the store can hook persist method for methods like:

I have simply hardcoded store for testing purposes at the moment.

But then I'm struggling to see what fields should be saved to cache:
Screenshot 2019-11-08 at 23 24 17

There are a couple of things here that cache utilizes but also not sure about some of them.
Looks like records are not enough. Links should be saved as well, so we can hook into each save and save them as individual keys. This can start with JSON.stringify data into a single key and can be extended later. IndexedDB or other storages can store native js objects so there will be little performance overhead on this. This is a very naive approach but it really ticks the box for basic persistence. This is what I deducted:

Screenshot 2019-11-08 at 23 47 42

Now we can simply restore this stuff on the restart, but I do not see a simple option to do so.
Kudos for amazing sample apps that helped a lot to write some prototype.

@kitten
Copy link
Member

kitten commented Nov 8, 2019

@wtrocki That looks awesome! I’m thinking of how we could approach this at “scale” 😂

So pessimism ensures stable perf and immutability (which we don’t need right now) but is otherwise really simple. I’ve been thinking that it’d be nice if we could have a store wrapper (or modify pessimism) that provides a synchronous KV layer (like what pessimism does right now) but flushes writes to any async storage. On start we’d then only have to restore from that async storage and queue up operations while we wait for it 🤔

I think, like you said, we wouldn’t even have to preserved optimistic writes, since on a restart we’d just reexecute offline operations, which then restore the optimistic writes anyway.

Regarding what needs to be saved, it’s only records, connections, and links that are relevant to persist data.

Edit: so my thinking is; we could allow for a persistence layer that allows to pass in any store that adheres to an interface with:

  • asynchronous/sync writeRecords, writeLinks, writeConnections etc (plural for batching so we can flush irregularly)
  • optimistic writes are never persisted
  • async/sync getLinks and others so we can restore
  • queueOfflineOperations and flushOfflineOperations so that we can run them when we go back online

Does that sound about right?

@wtrocki
Copy link

wtrocki commented Nov 9, 2019

I’ve been thinking that it’d be nice if we could have a store wrapper (or modify pessimism) that provides a synchronous KV layer (like what pessimism does right now) but flushes writes to any async storage.

Yes. I had that exactly in mind and it should be trivial.

async/sync getLinks and others so we can restore

Would it work like singleton - first call will try to restore from persistence.
It would be cool to be able to seed those 3 fields somehow at the time of cache creation.

queueOfflineOperations and flushOfflineOperations so that we can run them when we go back online

Really nice idea. Need to think on how this would work - optimistic responses and update methods will be global right?

@kitten
Copy link
Member

kitten commented Dec 7, 2019

I think we’re in a much better position now to tackle this 🥳

The pessimism KV layer is gone and has been replaced with a much simpler backing store. It’s still storing and treating optimistic entries separately, which is perfect since we don’t want to persist them.

The next step would hence be to allow a persisted store to be slotted in that we can flush writes to regularly. Then we’d want to introduce operation buffering to delay operations on startup while the store is being seeded. And lastly we’ll want to persist optimistic operations (and flush them after seeding and when the user goes back online)

Lastly we may want to enable full cache invalidation, which may need to be automatic. We could look at schema information that is persisted and invalidate parts of the offline store if it doesn’t match the schema anymore (and allow full clearing on logout for instance)

One unanswered question is how we can achieve this without I creasing the footprint of Graphcache massively.

@JoviDeCroock
Copy link
Author

JoviDeCroock commented Dec 8, 2019

I’d say that we’d only need a certain amount of things in graphCache:

  • a way to inject the offline store
  • an adapter for offline stores (ex: offix)

This way offix handles all the complex offline-online logic while graphcache remains focussed on being a normalised cache. I do agree that we should have some low_prio work that would involve taking our schema and removing fields,... this can be considered nice to have in the start though.

I think I still have a working implementation of the buffered operations, this does imply that we expect an async function to be passed to retrieve the offline store data -> run our adapter/transformer -> inject into our store.

I think if we limit ourselves to a serializer - transformer - “hydrator” the footprint impact would be small since the transformer/serializer part can be tree-shaken out and the added logic will be minimal.

First things imo would be to see how solutions for offline storage persist at this time and see how we can deserialize on our end to inject it.

@wtrocki
Copy link

wtrocki commented Dec 8, 2019

I’d say that we’d only need a certain amount of things in graphCache:
a way to inject the offline store

I think that is the key to everything
Once that it is possible I can work on connecting offix or even just storage like localForage etc.
I will try to apply the changes as suggested above and see if that will work.

First things imo would be to see how solutions for offline storage persist at this time and see how we can deserialize on our end to inject it.

There is actually nice thread on apollo client repo (as it will get cache storage feature for 3.0.
Do not want to link it here but TL;DR - generally because of the storage limitations on web cache is persisted once for a while using entire cache object. The same implementation exists in apollo cache persist.

@JoviDeCroock
Copy link
Author

@wtrocki I've started making an initial implementation for rehydrating a store: #124 now we need to hook into an adapter to write/delete/... On for instance offix.

@wtrocki
Copy link

wtrocki commented Dec 9, 2019

Awesome! thank you so much and sorry for not making it on time. Yes. I will try this out with the customized apollo-cache-persist and it that work I will post it as a package. Having dev version of the PR published will be amazing. Follow up will be to get more complex use cases like cache invalidation etc. (offix)

@kitten
Copy link
Member

kitten commented Dec 20, 2019

Persistence has been implemented now by #137 and #138. There's an example that demoes it in #141.

The next step now is working on an offline exchange (or a built in one into the main cacheExchange) that integrates with this and supports queueing up offline mutations, keeps the optimistic update intact (if any), and is able reexecute offline mutations on startup or when the user comes back online.

@wtrocki We publish every PR via Pika CI. So you can already give this a go by installing "urql": "https://github.pika.dev/formidablelabs/urql-exchange-graphcache/pr/138"

@kitten kitten added feature 🚀 New feature or request and removed future 🔮 An enhancement or feature proposal that will be addressed after the next release labels Dec 20, 2019
@JoviDeCroock
Copy link
Author

I think to efficiently do this in a separate exchange we'll need to add, OperationRequest.hasOptimisticResult, else we'll never know whether or not to let the mutation gracefully fail or buffer it either way.

@morrys
Copy link

morrys commented Dec 20, 2019

Hi to all,
I wanted to get to know the libraries I created to manage persistence and offline workflow for GraphQL libraries:

wora/cache-persist: uses a javascript object synchronously and processes communication with storage asynchronously (highly configurable in all its aspects, storages: localStorage, sessionStorage, indexedDB, React-Native AsyncStorage & any custom storage)

wora/netinfo: simple library that implements the react-native netinfo interface to use it also in the web

wora/offline-first: persistent Cache store for Offline-First applications, with first-class support for optimistic UI. Use with React, React Native, or any web app.

i used it for create:

The main advantages of integrating these libraries are:

  • compatibility for web & react-native
  • same methods of managing persistence of both offline store and cache
  • standardize persistence management and offline workflow in the main GraphQL libraries (Relay, Apollo & Urql) in a simple way

In the repository offline-examples you can find examples of using the offline for apollo (web and react-native) and relay (web and react-native)

For any additional information or if interested in making a beta in which they are integrated, please contact me I will be happy to answer and help.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feature 🚀 New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants