-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Load data for initial views from cache #506
Comments
I extended some work from #383 (comment) that gets Mosaic running in NodeJS using JSDom to cache the initial queries by wrapping the
I then tested running Mosaic from the cached data in Observable Framework: https://github.com/alexkreidler/mosaic-framework-example/tree/cache-initial-view-queries The connector on the frontend (in I can imagine this with an API like const coordinator = new vg.Coordinator();
const baseConnector = vg.wasmConnector(); // or socket or rest
const staticDir = "/_file/data/prebaked" // for Observable Framework, would be /public or similar for Next, Vite etc
const connector = wrapConnectorWithCache(staticDir)(baseConnector);
coordinator.databaseConnector(connector); You may also want to have this caching be somewhere else rather than at the connector level of the API. But it is promising that we can get Mosaic running in NodeJS so we can record the SQL queries and save their results to a cache somewhere when building a static site. |
Very cool that you figured out how to create precomputed results using jsdom. I think that faster load times are a really big improvement for Mosaic and would love to have some functionality in the core. I like that we can use the cache and don't even need DuckDB. What we will need to figure out is clear feedback for users when they start interacting with charts and things are not fully loaded yet. In terms of implementation, I think we should combine this with the bundling logic (e.g.
|
I think that's a great idea! I updated my observable-framework code to use the mosaic NodeJS server, send a I ran into a small issue where the type of the file saved in the bundle doesn't match the type requested by Mosaic (instead it is based on a regex in the create-bundle code), but it was an easy enough fix to find the right file based on the SHA256 hash of the query. This is fine because the code receiving the results from the connector can accept JSON objects or arrow tables for any query type. I also noticed the typings for Connection.query aren't right for the create or load bundle commands. E.g. while the correct format is We need a way to load a bundle into the WASM connector, which currently doesn't support them. We probably want a way to load "remote bundles" into the Python, Rust and Node servers so they can load bundles that someone else published or were pushed to object storage by a build process. A small concern is that the create-bundle / load-bundle code is duplicated in three languages and it may be tough to keep it in sync. I agree we should have a way to show users whether the plot is interactable (or give them an explanation if they interact but nothing happens). We could add a new function to Connector like Another question: should we have a way to configure if the source data is included in the bundle or if it should be fetched at runtime? For example if it is a large parquet file that updates fairly frequently. Also, how would we handle initial view caches that are out of date? One idea is for the connector to save all the queries that are served from the cache, re-run them against the newly loaded source data, check if the results are different, and then indicate that the data is stale, or call Plot.render. |
Very nice!
Can you point me to that?
I agree. One idea would be to simplify the bundles to make it easier to maintain them across languages or have the mosaic coordinator do most of the logic (rather than the server). Another could be to figure out a way to put the cache into duckdb itself so that loading a bundle would be as easy as calling an attaching a specific database. Re-considering how we do bundling is definitely on the table. |
Yeah, currently the typings are JSDoc comments that Typescript interprets as having a required sql field (also I realize I wrote Connection.query instead of I like the idea of putting the cache logic in the coordinator. It seems like it would be easy to run the queries to create a bundle. But it does seem harder to load the bundle because it needs to read those files into memory and update the cache, which is in the browser for WASM but on the server for the other connectors right now. |
Good point. Does #549 look right to you? |
There is also a cache in the browser. We have multiple levels of caching (and might want to reconsider some as we probably duplicate the cache in wasm). |
Right now this is left as a responsibility of MosaicClient implementations. The coordinator will call "queryPending" and the client can do with it what it will. I think this is the right abstraction, as clients can be arbitrary (they need not even correspond to visual UI components). But at minimum I think the reference components in vgplot and inputs could make better use of that. So I think adding some kind of pending indicator in the UI for those components could be a great start. |
Filed #550 |
Right now, loading initial views can be slow. If the user never interact with the charts, we loaded the full data unnecessarily and could have just loaded the data for the initial view. See for example https://idl.uw.edu/mosaic-framework-example/gaia-star-catalog which right now downloads the full 5m Gaia dataset on load.
The idea is to load the data for initial views into the cache and then serve requests for the initial views from this cache. To achieve this, we could launch a mosaic view and cache queries as part of the compile step and then at runtime load the cache into duckdb-wasm. A few pieces are missing but I think there is a lot of potential in having instant initial views.
This comment from https://observablehq.com/blog/observable-2-0 makes the point about initial loading but the current setup doesn’t fulfill that vision.
To get started on this project, make an observable Framework dashboard, simulate a slow network (using the browser network tools throttling feature, and observe the slow load times).
The text was updated successfully, but these errors were encountered: