-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WASMFS chunked fetch backend #23021
base: main
Are you sure you want to change the base?
WASMFS chunked fetch backend #23021
Conversation
93de65a
to
ff1c2d6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I'll make these style changes and stack saving. I think I may have messed up something with respect to closure compiler, but I'm not sure what exactly. |
248ca46
to
62987bb
Compare
src/library_fetchfs.js
Outdated
return withStackSave(() => { | ||
var manifest = 0; | ||
if (opts['manifest']) { | ||
manifest = wasmfs_fetch_create_manifest(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should require a _
prefix, shouldn't it? New line 20 shows such a prefix being used for another function. Does this work without the prefix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. Is there a good way to write unit tests for the JS side of interop? Say, a test that used JS to create a fetchfs filesystem and then run a C program that checks for he presence/contents of files in the filesystem? I’d be happy to add a test case if you can point me to an example of a test that isn’t just C code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can embed JS code into the C code using EM_ASM
etc. See e.g.
emscripten/test/wasmfs/wasmfs_getdents.c
Lines 100 to 108 in 5643c26
// The same, but via the JS API. | |
printf("------------- Reading from /dev Directory via JS -------------\n"); | |
EM_ASM({ | |
var entries = FS.readdir("/dev"); | |
for (var i = 0; i < entries.length; i++) { | |
console.log(entries[i]); | |
} | |
console.log(); | |
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok that’s helpful, and would at least tell us if the createBackend code path works. I was wondering if we could have a test where JS code in a pre run hook can successfully set up a filesystem—is there something like em_asm that runs before main but after global initializers?
An important functionality is to be able to consume via bare HTTP chunked data files (for Chrome local caching to work it's important to do chunks under 50 Mb) - e.g. so that we could still use basic, non-range supporting servers. And to check that GH Pages (and maybe some other bare hostings, where we cannot do modification of nginx.conf etc) works for this scenario... Some of my experience: In other words - it's important to ensure that Chrome can cache these chunks. |
That’s interesting. Should it be an option in opts, or a fallback in case we get error 416 or no accept-ranges header from the server? |
I mean I have nothing against range requests for chunked access, my point is that there should also be support for chunked data files, e.g. so that we can serve off GitHub Pages cacheable <50Mb files |
I think the idea of downloading multiple files into ranges of one file sounds tricky, but I do think it could make sense to map chunks onto remote files. I’d like to land this first with just range requests and maybe down the line the manifest could be extended to support this use case. |
89e1b4f
to
87ced78
Compare
I have fixed some bugs, added tests for the JS API, added docs, and rebased. I think this is ready to merge now if CI is clean, though of course I'm happy to revisit if there are code quality issues. I'll be able to look at this again on Monday, so I might not be able to fix it right away if there are CI issues. But I will get to them! |
src/library_fetchfs.js
Outdated
if(opts.base_url === undefined) { | ||
opts.base_url = ""; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if(opts.base_url === undefined) { | |
opts.base_url = ""; | |
} | |
opt.base_url ??= ''; |
src/library_wasmfs_fetch.js
Outdated
@@ -35,34 +31,74 @@ addToLibrary({ | |||
try { | |||
var u = new URL(fileUrl, self.location.origin); | |||
url = u.toString(); | |||
} catch (e) { | |||
} catch(_e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} catch(_e) { | |
} catch (_e) { |
|
||
const char* EMSCRIPTEN_KEEPALIVE _wasmfs_fetch_get_file_path(void* ptr) { | ||
const char* _wasmfs_fetch_get_file_path(void* ptr) { | ||
auto* file = reinterpret_cast<wasmfs::FetchFile*>(ptr); | ||
return file ? file->getPath().data() : nullptr; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this function still needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not, good catch!
src/library_wasmfs_fetch.js
Outdated
wasmFS$JSMemoryFiles[file] = new Uint8Array(buffer); | ||
} else { | ||
var chunkSize = __wasmfs_fetch_get_chunk_size(file); | ||
offset = offset || 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
offset = offset || 0; | |
offset ??= 0; |
system/include/emscripten/wasmfs.h
Outdated
// For example, a manifest like `{'/test.txt':'resources/file.txt'}` | ||
// mounted at `/dat` (using either wasmfs_mount or | ||
// wasmfs_create_directory) will let you write `open("/dat/test.txt", | ||
// O_RDONLY)`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the user not implement manifests using symlinks? That is, use redirection locally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is sometimes possible, but very tedious to arrange the directories and depends on the C code following symlinks properly, doing the right thing with stat calls, etc. I thought about it, but felt like adding the control over where downloaded files come from and go in the first place seemed better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the tedious part could be fixed with a helper somehow, but good points about some C code not following symlinks.
Still, I hope we can find a way to separate these two functionalities, of the fetch backend and a "rename/restructure" routing layer. In particular because other backends may want similar things.
One of the core ideas in WasmFS is that it is composable. For example, we have an "ignore-case" backend that ignores lowercase/uppercase (wasmfs_create_icase_backend
) and which is layered on top of another backend. That approach lets any backend have the property of ignoring the case of filenames. Following that, what do you think about adding a "routing" backend which would receive a manifest like here, and basically map locations to other locations? Then both the fetch backend and other backends could use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the manifest maps file paths to relative URLs, I don't immediately see how it's composable (the manifest could in principle map to URLs containing query parameters, for example). Are you saying the idea is that the composable bit is mapping paths in one filesystem to paths in a different filesystem? It sounds (to me) like a lot of work to build implementations of getChild, getEntries, etc that respect those remappings because I'm not super familiar with the wasmfs code or its invariants and not much of a C++ programmer, but if the path forward there seems obvious to you I'm happy to collaborate on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the idea is a mapping layer. For example, a test in this PR contains this:
FS.mount(FETCHFS, {"manifest":{"/file":"test.txt", "/sub/file":"test.txt"}}, "/dat2");
If we had a MAPFS that does such a mapping, we could do
FS.mount(MAPFS, {"/file":"test.txt", "/sub/file":"test.txt"}, "/dat2", FETCHFS);
This mapping backend would see /dat2/file
and map that to the path test.txt
, and it would route that to the underlying (layered) FETCHFS.
(query parameters can work too, as part of the filename? but that part I'm not sure of - illegal characters might be an issue?)
It sounds (to me) like a lot of work to build implementations of getChild, getEntries, e
It's not trivial, but it shouldn't be much more than the ignore-case backend:
In fact I'd guess 80% of the code could be identical. The only change would be that instead of ignoring the case of a file, we route to a possibly different path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I can sort of see how that makes sense, but I’m not sure which other backends would benefit since they would typically have more control over the layout of files.
I struggled in the implementation of the manifest and ended up creating every file record immediately on mount/root directory creation. I couldn’t figure out a way for a directory or file object to know its full path and to do this lazily. This makes sense because of things like symlinks, but I found implementing getChild/getEntries pretty tricky and let the memory directory implementation do the work.
If the manifest were a nested map (string -> filepath | manifest, natural for a json object), it’s clear to see how to implement getChild and friends, but constructing such a nested map in c is just as much work as calling createDirectory or createFile a bunch of times and seems about the same as using the memorydirectory implementation. In fact I guess a MAPFS would have to call those same functions on the target backend for the target paths, as the source path tree is being constructed, which I guess the case insensitive filesystem does too.
I can try to take a look at this over the next few days. One last question about your hypothetical example—does WASMFS FS.mount take a fourth parameter? According to lib/libwasmfs.js it does not.
This patch eliminates the need for asyncify and uses modern filesystem APIs instead of the deprecated, unmaintained BrowserFS. This is a WIP patch because it won't fully work until these two Emscripten PRs land and are released: emscripten-core/emscripten#23518 emscripten-core/emscripten#23021 The former fixes an offscreen canvas context recreation bug, and the latter adds an equivalent to BrowserFS's XHR filesystem (but without the hazardous running-XHR-on-the-main-thread problem). The biggest issue is that local storage of users who were using the old version of the webplayer will be gone when they switch to the new webplayer. I don't have a good story for converting the old BrowserFS IDBFS contents into the new OPFS filesystem (the move is worth doing because OPFS supports seeking and reading only bits of a file, and because BrowserFS is dead). I've kept around the old libretro webplayer under pkg/emscripten/libretro-classic, and with these make flags you can build a non-workerized RA that uses asyncify to sleep as before: make -f Makefile.emscripten libretro=$CORE HAVE_WORKER=0 HAVE_WASMFS=0 PTHREAD=0 HAVE_AL=1 I also moved the default directory for core content on emscripten to not be a subdirectory of the local filesystem mount, because it's confusing to have a subdirectory that's lazily fetched and not mirrored to the local storage. I think it won't impact existing users of the classic web player because they already have a retroarch.cfg in place.
I wonder if I should separate out the chunking part of this PR from the manifest part, since there’s some more design and implementation to do there and it could be achieved in the short term through e.g. mounting a bunch of single-file fetchfs backends with different base URLs. That’s less convenient but it’s easier to add more features later than change ones that were added before they were ready. |
f84d549
to
0770dc8
Compare
I've force-pushed a reduced version of this patch here, with the manifest stuff taken out. I'll introduce a new branch in a little while with the MAPFS idea. |
This patch eliminates the need for asyncify and uses modern filesystem APIs instead of the deprecated, unmaintained BrowserFS. This is a WIP patch because it won't fully work until these two Emscripten PRs land and are released: emscripten-core/emscripten#23518 emscripten-core/emscripten#23021 The former fixes an offscreen canvas context recreation bug, and the latter adds an equivalent to BrowserFS's XHR filesystem (but without the hazardous running-XHR-on-the-main-thread problem). The biggest issue is that local storage of users who were using the old version of the webplayer will be gone when they switch to the new webplayer. I don't have a good story for converting the old BrowserFS IDBFS contents into the new OPFS filesystem (the move is worth doing because OPFS supports seeking and reading only bits of a file, and because BrowserFS is dead). I've kept around the old libretro webplayer under pkg/emscripten/libretro-classic, and with these make flags you can build a non-workerized RA that uses asyncify to sleep as before: make -f Makefile.emscripten libretro=$CORE HAVE_WORKER=0 HAVE_WASMFS=0 PTHREAD=0 HAVE_AL=1 I also moved the default directory for core content on emscripten to not be a subdirectory of the local filesystem mount, because it's confusing to have a subdirectory that's lazily fetched and not mirrored to the local storage. I think it won't impact existing users of the classic web player because they already have a retroarch.cfg in place.
* workerized RA * Workerized (non-async) web player, using OPFS This patch eliminates the need for asyncify and uses modern filesystem APIs instead of the deprecated, unmaintained BrowserFS. This is a WIP patch because it won't fully work until these two Emscripten PRs land and are released: emscripten-core/emscripten#23518 emscripten-core/emscripten#23021 The former fixes an offscreen canvas context recreation bug, and the latter adds an equivalent to BrowserFS's XHR filesystem (but without the hazardous running-XHR-on-the-main-thread problem). The biggest issue is that local storage of users who were using the old version of the webplayer will be gone when they switch to the new webplayer. I don't have a good story for converting the old BrowserFS IDBFS contents into the new OPFS filesystem (the move is worth doing because OPFS supports seeking and reading only bits of a file, and because BrowserFS is dead). I've kept around the old libretro webplayer under pkg/emscripten/libretro-classic, and with these make flags you can build a non-workerized RA that uses asyncify to sleep as before: make -f Makefile.emscripten libretro=$CORE HAVE_WORKER=0 HAVE_WASMFS=0 PTHREAD=0 HAVE_AL=1 I also moved the default directory for core content on emscripten to not be a subdirectory of the local filesystem mount, because it's confusing to have a subdirectory that's lazily fetched and not mirrored to the local storage. I think it won't impact existing users of the classic web player because they already have a retroarch.cfg in place. * Get fetchfs working without manifest support * makefile fixes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I think this makes sense split out like this: even if we do decide that a manifest option only fits in the Fetch backend, it is good to land that in a separate PR as a later addition (but I am still optimistic about the layered approach, hopefully that works well).
This mostly looks good to me, just some minor comments.
src/lib/libfetchfs.js
Outdated
opts.base_url ??= ""; | ||
return withStackSave(() => { | ||
return _wasmfs_create_fetch_backend(stringToUTF8OnStack(opts.base_url), opts.chunkSize | 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
opts.base_url ??= ""; | |
return withStackSave(() => { | |
return _wasmfs_create_fetch_backend(stringToUTF8OnStack(opts.base_url), opts.chunkSize | 0); | |
return withStackSave(() => { | |
return _wasmfs_create_fetch_backend(stringToUTF8OnStack(opts.base_url ?? ""), opts.chunkSize | 0); |
src/lib/libwasmfs_fetch.js
Outdated
} else { | ||
var chunkSize = __wasmfs_fetch_get_chunk_size(file); | ||
offset ??= 0; | ||
len = len || chunkSize; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
len = len || chunkSize; | |
len ??= chunkSize; |
src/lib/libwasmfs_fetch.js
Outdated
fileInfo.headers.has("Content-Length") && | ||
fileInfo.headers.get("Accept-Ranges") == "bytes" && | ||
(parseInt(fileInfo.headers.get("Content-Length")) > chunkSize*2)) { | ||
wasmFS$JSMemoryRanges[file] = {size:parseInt(fileInfo.headers.get("Content-Length")), chunks:[], chunkSize:chunkSize}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wasmFS$JSMemoryRanges[file] = {size:parseInt(fileInfo.headers.get("Content-Length")), chunks:[], chunkSize:chunkSize}; | |
wasmFS$JSMemoryRanges[file] = { | |
size: parseInt(fileInfo.headers.get("Content-Length")), | |
chunks: [], | |
chunkSize: chunkSize | |
}; |
this matches our general formatting a little better
src/lib/libwasmfs_fetch.js
Outdated
} | ||
var wholeFileData = new Uint8Array(await wholeFileReq.arrayBuffer()); | ||
var text = new TextDecoder().decode(wholeFileData); | ||
wasmFS$JSMemoryRanges[file] = {size:wholeFileData.byteLength, chunks:[wholeFileData], chunkSize:wholeFileData.byteLength}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please format this too
system/include/emscripten/wasmfs.h
Outdated
backend_t wasmfs_create_fetch_backend(const char* base_url __attribute__((nonnull))); | ||
// | ||
backend_t wasmfs_create_fetch_backend(const char* base_url __attribute__((nonnull)), | ||
uint32_t chunkSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please document chunkSize.
std::shared_ptr<DataFile> createFile(mode_t mode) override { | ||
return std::make_shared<FetchFile>(baseUrl, mode, this, proxy); | ||
const std::string FetchBackend::getFileURL(const std::string& filePath) { | ||
if(filePath == "") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if(filePath == "") { | |
if (filePath == "") { |
test/wasmfs/wasmfs_fetch.c
Outdated
@@ -42,7 +43,7 @@ void check_file(int fd, const char* content) { | |||
void test_url_relative() { | |||
printf("Running %s...\n", __FUNCTION__); | |||
|
|||
backend_t backend2 = wasmfs_create_fetch_backend("test.txt"); | |||
backend_t backend2 = wasmfs_create_fetch_backend("test.txt",0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
backend_t backend2 = wasmfs_create_fetch_backend("test.txt",0); | |
backend_t backend2 = wasmfs_create_fetch_backend("test.txt", 0); |
below as well
Thanks, I’ll take a look at this Monday! |
0770dc8
to
8f14340
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read this in detail now. Mostly minor comments, and I won't have any more after these.
See also the CI errors, though (it isn't done but looks like other.test_closure_full_js_library_wasmfs
is failing).
The closure test failure was due to not passing a radix to parseInt, I've resolved that too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, late the party with mostly nits
src/lib/libfetchfs.js
Outdated
return _wasmfs_create_fetch_backend(stringToUTF8OnStack(opts.base_url)); | ||
} | ||
return withStackSave(() => { | ||
return _wasmfs_create_fetch_backend(stringToUTF8OnStack(opts.base_url ?? ""), opts.chunkSize | 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can just use the simple () => foo
from here so you don't need the second return
or the curl braces.
src/lib/libwasmfs_fetch.js
Outdated
// This will always give us a chunk >= firstChunk since len > 0. | ||
var lastChunk = ((offset+len-1) / chunkSize) | 0; | ||
if (!(file in wasmFS$JSMemoryRanges)) { | ||
var fileInfo = await fetch(url,{method:"HEAD", headers:{"Range": "bytes=0-"}}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space after url,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use single quotes here and below.
src/lib/libwasmfs_fetch.js
Outdated
if (fileInfo.ok && | ||
fileInfo.headers.has("Content-Length") && | ||
fileInfo.headers.get("Accept-Ranges") == "bytes" && | ||
(parseInt(fileInfo.headers.get("Content-Length"), 10) > chunkSize*2)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these three line needs one more space of indent?
src/lib/libwasmfs_fetch.js
Outdated
throw response; | ||
} | ||
var bytes = new Uint8Array(await response['arrayBuffer']()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can do await response.bytes()
here? And I would hope that you don't need to quote the bytes
method name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this is still not resolved?
I've addressed all of @sbc100 's comments. |
This override doesn't seem to be necessary. The mime type of `.js` files seems to be set just find by the default `send_head`. Split out from emscripten-core#23021
I think we can/should land the send_head simplification first/separately: #23609 |
OK, but send_head is needed for the range case so I'll be adding back the simplified version. |
This override doesn't seem to be necessary. The mime type of `.js` files seems to be set just find by the default `send_head`. Split out from #23021
Ah dang, I hit the wrong button. If you don't like merge commits let me know and I can try to undo it/rebase normally |
Whatever rebase or merge strategy you like is fine for PRs. The PR will get squashed and rebased at the end of the review either way. |
Is the PR basically ready to land now? Should we remove the WIP from the title? |
Indeed! It's ready to go. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Looks ready to land aside from the changelog comment.
(and unless @sbc100 has other comments)
Looks like |
@@ -94,7 +94,7 @@ addToLibrary({ | |||
if (!response.ok) { | |||
throw response; | |||
} | |||
var bytes = await response.bytes(); | |||
var bytes = await response['bytes'](); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm OK landing this this way. I will see if I can find a way to teach closure about this as a followup.
This extends the existing WASMFS fetch backend
in two ways:Supporting a manifest mapping of file paths to URLs, so that the resources on the server don't need to match the layout of the virtual filesystem.This passes all the existing fetchfs tests (and I added more for new features). I would especially appreciate review of the C++ code (I'm not much of a C++ person).
On the way here, I also had to make the test http server support range requests.