Replies: 2 comments 2 replies
-
That would be pretty cool. I think for LanceDB, you would basically need to write a pure-JS implementation of Lance to read the Lance files and the table format. Right now, all of that is implemented in a Rust library. It wouldn't be a small lift to port that to JS, but I don't think it would be impossible. |
Beta Was this translation helpful? Give feedback.
-
You might be able to compile a rust core down to wasm. The Lance file format would be overkill. We have lots of code for statistics, compression, legacy version of the format, etc. which would just bloat the wasm artifact. Arrow IPC would be a lighter weight option and there is already support for that in JS. So I think the main thing you'd want to extract into a wasm core would be the lance-index (lance-linalg, etc.) code to actually do the search. Less than 1s will probably be impossible for very large datasets (100s of millions of rows+). Your client is going to start with nothing cached in memory at all. This means you will have to load portions of the large index file into memory. For IVF/PQ this means a few rather large loads. For graph-based algorithms this would mean many small loads. More realistic would probably be a 30-60 second loading time and then very fast searches after that. |
Beta Was this translation helpful? Give feedback.
-
Hi folks,
I'm a big fan of Lance and the separation of concerns approach of LanceDB in general. Did I understand correctly, that LanceDB always requires a proper DB connection in the remote setup?
I'm exploring ways of using a static index dir/file that you can dump anywhere and that could be queried in <1s via range requests. I was wondering if you had any good ideas whether this was somehow possible with the lance data format and e.g. DuckDB or similar.
Hosting static files is cheap and often free (Github, Huggingface). My idea is that you could have a super lean, e.g. JS-only frontend retrieving data from a massive static index.
I wrote about this idea and a hacky but somewhat working research demo here: https://github.com/do-me/flatgeobuf-vectordb
I guess if one could somehow optimize ANN or HNSW for the columnar nature of the data format and range requests, there might be some kind of way.
Would love to hear you ideas and thoughts!
Beta Was this translation helpful? Give feedback.
All reactions