Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql.js is almost 2 OOM slower than WebSQL on the same query (identical set of tables, rows, views, indices, hw, browser) #225

Closed
monfera opened this issue Jan 9, 2018 · 11 comments

Comments

@monfera
Copy link

monfera commented Jan 9, 2018

Adding some input relating to this closed item; it can be immediately closed as there aren't many things to try, but maybe someone has an idea of the cause, or helps spot some perf issue, or at least I could learn how others use sql.js when it appears to be really slow when there are a few dozen related tables/views, even with only 150 input rows.

I ended up implementing some interactive visualization in pure SQL (sans a single input and DOM element update) where there are about 5 input tables (one with 150 rows, the other 4 are with 1..4 rows), and a few dozen views that build on these four tables and on each other, using all kinds of basic things (groupBy, union, join etc. but nothing fancy computationally). There's one final query whose result is used to output the result - redraws 150 circles on the screen (rendering is negligible time cost).

The calculations generate an X and Y scale for this prototype scatterplot, such that the rectangular area (whose size is the input, a.k.a. responsive design) is optimally filled with the circles, but no circle gets truncated (it can be seen on the image that the circles at the edges touch the image border):

scatter

Initial result: one render taking around one minute.

Subsequent tweaks:

  • replaced the couple of cross joins (which bind the hand of the optimizer in SQLite)
  • added indices where it seemed to be needed
  • converted several views to tables, where it seemed to help: just one out of the 5 input tables receives updates; it's bound to a mousemove event (causing an UPDATE of the single record of two columns in that table), which channels into the midsection of the entire DAG of views; many views which were upstream of this dynamic input in the DAG were converted to a table, but of course nothing downstream can be converted as they'd become nonresponsive, SQLite doesn't have materialized views let alone incrementally updating materialized views

These changes resulted in an update time of 1.5 seconds. explain query plan wasn't a big help for two reasons:

  • it doesn't refer to table names, just some operation indices, and I can't often associate those indices with the corresponding (sub)queries
  • still I eliminated suspects in those parts where I could use create table ... as select ...; yet for views, which do the work (incl. creating a new column whose value is a very simple calculation on the other fields of the row), one cannot create indices, so there are a couple of USING TEMP B-TREE (UNION) and the like.

I then thought, why not try it with WebSQL, which is about the same SQLite but runs native. I got somewhat varying frame rates, typically between 15..20FPS, sometimes 30FPS.

==========

In sum, the same exact set of views updated in 1500ms with sql.js and in 50ms with WebSQL, though I could add an analyze to sql.js but not to WebSQL.

So this is 1-2 orders of magnitude slower speed, quite unexpected, especially because the sql.js version benefits from not having to go through a browser API boundary (though it's a minor point as the single final query just retrieves 150 rows with 5 numeric columns).

Note: even the 50ms update is quite slow, as a woefully non-optimized plain JS version, which just materializes all views (corresponding to pretty much the same queries with a winged multiset relational algebra DAG), and rebuilds fully if invalidated, runs in a fraction of that time. Also, that 50ms was achieved at the expense of 1) quite heavy tweaking, esp. selecting into tables rather than views where sensible; 2) reducing the generality, because now only one of the 5 tables can accept insert/update changes; if I retain full DAG recalc flexibility ie. all but the 5 tables are views, then the numbers are even farther:

simple, fully recalculating DAG in JS: ~10ms < WebSQL: ~150ms < sql.js: 9400ms

Now I wonder about real use cases where there may be more than 150 rows and mildly nontrivial views (or compound selects) where people find sql.js sufficiently fast. It's a weighty dependency to consider (2.6MB) so I'm interested in if, and how folks are using it.

@lovasoa
Copy link
Member

lovasoa commented Jan 9, 2018

It's hard to say anything without actually seeing the code. But indeed, sql.js is slower than anything native, and handwritten code is usually faster than using even a native SQL implementation.

@monfera
Copy link
Author

monfera commented Jan 9, 2018

Thanks @lovasoa I'll try to brew a minimal set of queries that shows the difference without less relevant domain-specific content, although as you suggest, probably not much can be done about it in sql.js, and unfortunately WebSQL is not a cross-browser thing.

@monfera
Copy link
Author

monfera commented Jan 9, 2018

... also, with upcoming Spectre / Meltdown bug fixes, there may be negative impact on asm.js / WASM

@lovasoa
Copy link
Member

lovasoa commented Jan 9, 2018

By the way, I forgot to ask, but do you use prepared statements ?
They improve performance when you have to run the same queries several times.
Do you have primary keys where needed ? Do you use WITHOUT ROWID tables ?

@monfera
Copy link
Author

monfera commented Jan 9, 2018

The render loop was as simple as

window.addEventListener('mousemove', e => {
    db.exec(`
    update desiredWindowSize
      set width = ${e.x - 8},
          height = ${e.y - 8}
    `)

    const rows = db.exec('select domElement, cx, cy, r, fill, opacity from screenSpaced')[0].values
    rows.forEach(render) // 150 rows
}

I can't prepare the update (bind, get are select-specific) but on your suggestion I tried it on the select which has no parameters:

const   query = db.prepare(`select domElement, cx, cy, r, fill, opacity from screenSpaced`)
query.bind({})

window.addEventListener('mousemove', e => {
    db.exec(`update desiredWindowSize set width = ${e.x - 8}, height = ${e.y - 8};`)
    while(query.step()) {
      draw(query.getAsObject())
    }
}

This change resulted in a very nice 2-3x speedup for the version which has all views, making it catch up with the version that materializes into tables (create table xxx(...) as select ...). It's still around 3500ms so I won't get a frame rate out of it, but it was still an interesting experiment. Whether or not I used WITHOUT ROWID doesn't seem to matter. Thanks for the suggestions!

@monfera
Copy link
Author

monfera commented Jan 9, 2018

P.S. I used your suggestion and added back the ANALYZE; so the materialized version now also benefits from the speedup (in fact, 3-4x) so now it's "only" 1000ms, still some ways from the native 50ms but the PREPARE, properly run shaved off close to one OOM out of the ~2 OOM initial speed difference. I just thought one caught up with the other as the materialized version didn't currently have the ANALYZE; which was needed to realize the extra speed.

@audioscavenger
Copy link

WebSQL is deprecated. Why do you even bother?

@monfera
Copy link
Author

monfera commented Jan 14, 2018

I know. Some of my interests were:

  • gauging how asm.js / WASM performance compares to native speed
  • seeing if I miss something when trying to compare fairly (and learnt I did, eg. the importance of preparing statements)
  • ... or if I miss general optimization steps (I did, eg. WITHOUT ROWID, although they didn't yield a noticeable improvement)
  • understanding whether either sql.js or WebSQL, using fancy relational algebra based optimizations and hand-optimized low-level C code for fast operations, can be competitive with fairly dumb, unoptimized or ad-hoc optimized JS code (on my example that doesn't exercise several features of a SQL database that probably add a lot of overhead)
  • seeing how informative EXPLAIN QUERY PLAN results are
  • eliciting feedback on how and why people might use sql.js when it's very large and apparently SQLite doesn't do a very good job of query optimization and/or physical representation (presumably due to constraints that are unimportant to me but not to a general embedded database user)

while

  • seeing to what extent SQL, or generally, relational algebra, is sufficient, suitable and convenient for expressing domain-specific logic that's not traditionally mapped to SQL; in this case, interactive data visualization
  • solving a specific dataviz challenge (seemingly basic, yet quite nontrivial), ie. the problem of maximally exploiting given rectangular area for a bubble chart such that all bubbles are rendered fully

etc., so in short, learning.

@monfera monfera closed this as completed Jan 25, 2018
@Taytay
Copy link
Contributor

Taytay commented Apr 26, 2019

@monfera : I'm looking for a benchmark, so I am curious if you were able to distill down these queries to something simple to compare? I ask because we are converting this to WASM in #255, which is faster.

@monfera
Copy link
Author

monfera commented Apr 26, 2019

@Taytay not really - I hooked up a few tables and a bunch of views atop of the tables and other views, updated some input table on every requestAnimationFrame and in turn, queried the top level view. It wasn't a lot of data but quite a few views in a relatively deep graph, joining small views/tables with one another, selection, aggregation etc. Lmk if I should I send the test page to your mailbox as seen on gh (or elsewhere) or dm me on twitter

@Taytay
Copy link
Contributor

Taytay commented May 10, 2019

@monfera : This issue is closed, but now that #255 is merged, SQL.js is compiled into WASM, which is faster, but not an Order of magnitude faster. :) Still, I'd be curious what you found in your tests with the new version.

In my testing, the more you can keep in the WASM code, the better. Calling across the boundary into WASM via multiple calls to prepare, bind, and execute are going to be slow. I was recently experimenting with simply batching inserts to see how fast I could get things, and even when I was using prepare and bind, it was slow if I inserted an individual row at once. It was at least 1 OOM faster if I inserted multiple rows with a single SQL statement. So, preventing calls across that boundary is essential.

I don't know if the current version of SQL.js is compiled to know that everything should be "in memory", which might be a faster way to run SQLite. I've got an experimental branch open that removes some features and made things at least twice as fast, but it's a bit out of date now : Taytay#1
Still, it shows that there are certainly more optimizations to come.

Also, you said that you can't use prepare because it is "select specific", but I think if you prepared your statement :

    update desiredWindowSize
      set width = ?,
          height = ?

and then ran bind to give it the exact parameters, that would work, right? I wasn't sure I followed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants