Description
Adding some input relating to this closed item; it can be immediately closed as there aren't many things to try, but maybe someone has an idea of the cause, or helps spot some perf issue, or at least I could learn how others use sql.js
when it appears to be really slow when there are a few dozen related tables/views, even with only 150 input rows.
I ended up implementing some interactive visualization in pure SQL (sans a single input and DOM element update) where there are about 5 input tables (one with 150 rows, the other 4 are with 1..4 rows), and a few dozen views that build on these four tables and on each other, using all kinds of basic things (groupBy, union, join etc. but nothing fancy computationally). There's one final query whose result is used to output the result - redraws 150 circles on the screen (rendering is negligible time cost).
The calculations generate an X and Y scale for this prototype scatterplot, such that the rectangular area (whose size is the input, a.k.a. responsive design) is optimally filled with the circles, but no circle gets truncated (it can be seen on the image that the circles at the edges touch the image border):
Initial result: one render taking around one minute.
Subsequent tweaks:
- replaced the couple of
cross join
s (which bind the hand of the optimizer in SQLite) - added indices where it seemed to be needed
- converted several views to tables, where it seemed to help: just one out of the 5 input tables receives updates; it's bound to a mousemove event (causing an
UPDATE
of the single record of two columns in that table), which channels into the midsection of the entire DAG of views; many views which were upstream of this dynamic input in the DAG were converted to a table, but of course nothing downstream can be converted as they'd become nonresponsive, SQLite doesn't have materialized views let alone incrementally updating materialized views
These changes resulted in an update time of 1.5 seconds. explain query plan
wasn't a big help for two reasons:
- it doesn't refer to table names, just some operation indices, and I can't often associate those indices with the corresponding (sub)queries
- still I eliminated suspects in those parts where I could use
create table ... as select ...
; yet for views, which do the work (incl. creating a new column whose value is a very simple calculation on the other fields of the row), one cannot create indices, so there are a couple ofUSING TEMP B-TREE (UNION)
and the like.
I then thought, why not try it with WebSQL, which is about the same SQLite but runs native. I got somewhat varying frame rates, typically between 15..20FPS, sometimes 30FPS.
==========
In sum, the same exact set of views updated in 1500ms with sql.js
and in 50ms with WebSQL, though I could add an analyze
to sql.js
but not to WebSQL.
So this is 1-2 orders of magnitude slower speed, quite unexpected, especially because the sql.js
version benefits from not having to go through a browser API boundary (though it's a minor point as the single final query just retrieves 150 rows with 5 numeric columns).
Note: even the 50ms update is quite slow, as a woefully non-optimized plain JS version, which just materializes all views (corresponding to pretty much the same queries with a winged multiset relational algebra DAG), and rebuilds fully if invalidated, runs in a fraction of that time. Also, that 50ms was achieved at the expense of 1) quite heavy tweaking, esp. selecting into tables rather than views where sensible; 2) reducing the generality, because now only one of the 5 tables can accept insert/update changes; if I retain full DAG recalc flexibility ie. all but the 5 tables are views, then the numbers are even farther:
simple, fully recalculating DAG in JS: ~10ms < WebSQL: ~150ms < sql.js
: 9400ms
Now I wonder about real use cases where there may be more than 150 rows and mildly nontrivial views (or compound selects) where people find sql.js
sufficiently fast. It's a weighty dependency to consider (2.6MB) so I'm interested in if, and how folks are using it.