Streaming Network/File Support #1012

wallw-teal · 2020-09-01T15:32:40Z

OpenSphere currently uses a single GET request to load the entire response into memory. Files are similarly loaded entirely into memory, and are additionally limited because we currently store the file in a single IDB key, which further limits the file size to that of a single IDB value (~104MB).

Loading and parsing large files is problematic in that it spikes memory. For configurations such as Electron (which uses file:// URLs rather than IDB storage), it is fairly trivial to crash the application. Instead, we should stream the file from the source.

Network

For network requests, including file://, we should be able to do the following steps:

Make a HEAD request to URL
Check the content-length response header. If it is small enough, we can just load it and run legacy parsers.
If it contains the Accept-Ranges: bytes response header, then we can stream it via range requests. fetch with ReadableStream on Response.body does not buy us much here as the full response is still being loaded into memory even if we parse it as each chunk is pushed through. The resulting API here should still use ReadableStream. Note that this method of streaming is common in video players which support DASH (and maybe also HLS), so it may be possible to use or adapt some of the network logic from something like Google's Shaka Player (which would play nice with the compiler).
For the initial type detection and sample parse for import, tee the stream

We will need to move the parsers from full format parsers (e.g. JSON.parse(response) and document parsing) to streaming parsers. It should be possible to do this in a piecemeal/backwards-compatible manner so that we don't just break third-party parsers (Does the parser support streaming? If not then spool up the whole thing and pass it in, but be wary of file size so we don't crash). We already have streaming JSON/XML "parsers" used by file type detection (oboe and xml-lexer).

Note: API requests such as WMS/WMTS/WFS may not have support for byte ranges and as such may benefit from fetch/ReadableStream over just xhr GET.

Note: This demo makes use of fetch/ReadableStream without spooling up all the bytes of the response (so that may be the way to go if that's possible).

Warning: the other thing to be careful of here is browser support for ReadableStream (which should be decent). However, some of the transform streams like TextDecoderStream aren't implemented in current Firefox, so polyfills for those will be needed.

File

For files loaded from disk (but not in Electron because that just resorts to file:// URLs), the native File should be streamable (if not with a ReadableStream implementation then with Blob.slice()). However, the biggest issue there is that when the application restarts, we no longer have access to that File instance without the user going to pick it from the file browser again. That's why we currently dump files into IDB.

Some strawman options. We are definitely open to suggestions here:

Stop using IDB file storage. Files become usable in the current session only. Offer to upload the files and use a URL if you want to keep it between sessions?
Expand IDB file storage to multiple keys (which moves the limit from IDB single value size to total available IDB size)
A hybrid approach where we continue to store "smaller" files and stream in larger ones but do not save the larger ones to storage

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streaming Network/File Support #1012

Streaming Network/File Support #1012

wallw-teal commented Sep 1, 2020 •

edited

Loading

Streaming Network/File Support #1012

Streaming Network/File Support #1012

Comments

wallw-teal commented Sep 1, 2020 • edited Loading

Network

File

wallw-teal commented Sep 1, 2020 •

edited

Loading