-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Export improvements #400
Export improvements #400
Conversation
Regarding the test failure:
Looking through emscripten issues it looks like there have been various fun moments with getTempRet0/setTempRet0 over the years 😅 |
The issue55 test is the one in particular that tests this part of the code, I would like to make sure it passes before merging.
👍 This seems like the most promising and logical thing to me. This should allow us to build with |
Yeah, I’m trying to figure out how to make emscripten include env.getTempRet0 in the sql-asm.js target output. If I copy the implementation from one of the other files it passes. Also, VACUUM can take some time and possibly lock up UI or workers. Do you have any concerns with changing the implementation in such a way?
I’ll give this a try, too. Having way too much fun with things! |
@kripken Since this was/is your baby, do you have any opinions on exporting and in-memory vs filesystem? And if you have any getTempRet0 tricks up your sleeve... 😜 |
I think going directly to memory might be better than using a filesystem, yeah. Could be much faster to just malloc some room and use that. For
|
For what it's worth, I would love to avoid the overhead of the virtual FS and just tell SQLite that it's writing to memory. Thank you for exploring this @seidtgeist! |
@seidtgeist, did you do any experimentation with the memory-based option that would allow us to move away from a virtual FS? I No pressure! I just didn't want to duplicate work if I were to dive in. |
@Taytay I have not, but there’s a custom VFS in deno-sqlite that might interest you: https://github.com/dyedgreen/deno-sqlite/blob/master/build/src/vfs.c |
One drawback of that is export will use more RAM, potentially doubling peak WASM memory usage. An application could reach a point where the database can no longer be exported because there is not enough contiguous WASM memory for |
I agree. I have since learned more about sqlite and changed my opinion on this topic: There is great utility in having a filesystem (totally aware of the irony expressed) exposed to sqlite, such as attaching multiple databases, vacuuming into a new file and other useful features that rely on a filesystem being present. Areas worth investigating:
|
I recently built a SQLite VFS that uses IndexedDB as a block device. It doesn't support multiple connections so the locking calls are just stubs. It works, but I don't have good tests or performance measures for it so I would call it a proof of concept. The main obstacle is that the VFS model is synchronous while IndexedDB is not, which I bridge with Asyncify. This creates two issues:
Getting a synchronous Storage Foundation API would bypass both issues...but it isn't clear when that would arrive, if ever. Note that IndexedDB originally had a synchronous API that ultimately was removed. Also I don't know of any existing or proposed way to implement synchronous locks, which might mean asynchronicity would still be required to support multiple connections. |
That’s exciting! Would you be able to share it? It would help me with writing a memory VFS without needing to through the FS module.
Ok, I’ve been wondering about this. Since I don’t expect a sync filesystem API to arrive ever (persistence? sync? low level? lol), would it be more reasonable to keep files in memory and persist them to in a separate process? |
I don't quite understand the advantages of a memory VFS over the current in-memory filesystem, can you explain? In any case, I can share what I have; I'll try to DM you.
I wouldn't count a synchronous API in or out completely, at least for Worker contexts. There apparently is an actual implementation in the Chrome origin trial described here and mentioned in #302. I'm prepared for disappointment, but I do have hope for it. I would like to see SQL.js either define its own official asynchronous API or make it easier to develop and use alternative APIs (including asynchronous ones). That's what I was getting at in my #377 comment. |
Based on your previous comments I assume you mean emscripten’s MEMFS and not sqlite’s :memory: databases, so that database files can be easily accessed without needing to serialize or copy. I haven’t looked much into sqlite’s VFS interface yet, so I don’t know whether there’s additional information/parameters available to the VFS interface that might be useful in implementing something better. Looking at https://www.sqlite.org/src/doc/trunk/src/test_demovfs.c it doesn’t seem that way.
I’m interested in this area because I once considered streaming WAL pages to remote persistent storage.
Fair enough, but it’s not something I’ll expect to be available universally within the next three years or so.
I agree with your points. I’ve started my own emscripten-compiled version of the sqlite amalgamation to learn more about the C API and optimize it for my use case. Since I‘m not banking on widely available and reliable persistent storage any time soon I’m considering IDB for local persistence and S3 as a remote key-value store for reliable persistence. I find it interesting to think about guaranteeing data consistency on the application level and minimizing deltas. |
One of the concerns I have with the VFS shims is that it appears to use exceptions for some unexceptional things. When I debug SQL.js and tell it to breaking on all exceptions (not just uncaught), it's constantly breaking due to some sort of exception-based flow control. I think it's due to SQLite performing stat calls on non-existent lock files, but it's been a long time since I looked. |
You're right that SQL.js throws and catches a bunch of exceptions looking for files on start up, but I think that's actually all in the Emscripten filesystem. When it sets up its initial directories and devices, it checks to see if creating them is allowed here: mayCreate: function(dir, name) {
try {
var node = FS.lookupNode(dir, name);
return {{{ cDefine('EEXIST') }}};
} catch (e) {
}
return FS.nodePermissions(dir, 'wx');
}, And indeed the non-existence of a file is signaled with an exception. Amusingly, the error condition here occurs on the lack of an exception. But I don't think this is related to SQLite VFS, other that the standard VFS uses the Emscripten FS. I don't see any exceptions, caught or uncaught, using my own VFS implementations (other than from my bugs). |
Ah yes - you're right. I wasn't clear in my statement. The built in Emcripten posix-emulating file system was what concerned me. It sounds like your implementation is avoiding those issues and that sounds promising. |
Based on #302 I’ve been trying to improve export, and this has lead me on a path™. Have you considered using one of these options?
VACUUM INTO
a temporary file that’s read, unlinked and returnedIn this PR, I’m giving (1) a shot. I’m very interested to hear what you have to say. Ultimately, option (2) might be better suited for a generic
export
function since it doesn’t incur a VACUUM, and I will most likely give that a try as well.On a related note: Since the current filesystem is MEMFS, have you considered just using an in-memory database instead? Right now, sqlite believes it’s writing to a persistent filesystem, but MEMFS is backed by memory. Since everything’s happening in-memory anyway, why not use sqlite’s native memory-backed mechanics and drop the MEMFS indirection along the way?
I have no experience with IDBFS (#397), and boy would I prefer if we had something like NativeIO.
Some notes to self: