Replies: 7 comments 37 replies
-
I also wanted to note many of the existing Hybrid DB solutions dont support ACID, and have limited scope for migrations, for backups, transactions and rollbacks. It makes deploying those solutions tricky, and end users wanting more 'RDB' like features in Hybrid DBs. |
Beta Was this translation helpful? Give feedback.
-
I could consider making a pull request to add vector support with an ANN algorithm. How could we validate this feature @tobiemh before considering any actual implementation? |
Beta Was this translation helpful? Give feedback.
-
Hi @tobiemh @vade I've begun on an initial implementation. I can't find the RFC-process described anywhere, should I open an issue describing design and implementation details/questions that we should settle on? My suggestions is to first implement a working vector type with appropriate functions (distances functions, math, normalization etc.) and then afterwards focus on providing approximate indexing |
Beta Was this translation helpful? Give feedback.
-
If you haven't come across it, this database is focused around vector stores, maybe there's room for inspiration? https://weaviate.com/ :) |
Beta Was this translation helpful? Give feedback.
-
Folks, it's me again, vector embeddings and similarity search (via SVM) seem simple enough to pull off, you are missing out of the LLM crazy and massive AI apps adoption, please think about implementing this, so that we can have all our data live in one place. Supabase has pg-vector, you do not offer anything along these lines! It's very important. |
Beta Was this translation helpful? Give feedback.
-
Brilliant work on the database with the ultra-comprehensive capability, m'friends! :) I'm with @mysticaltech on this one -- AI is popping off in a big way and won't stop increasing exponentially. For that reason, supporting AI use-cases I think is potentially a huge value-prop and dramatic business win going into the future. So many app developers will want some basic similarity-search in their apps, for example; as a developer/architect, being able to throw your embeddings in alongside everything else, in the same transactions, etc -- that's very nice. Being able to do it all in one integrated system that can run anywhere, even if it doesn't have advanced ML features or high levels of convenience wrappers, is very attractive. Just nailing natively the core functionality of high-performance indexed vector searching -- I think that's probably most of the way there, since the community can add convenience layers on top of that for you. Not to mention, SurrealDB supports many use-cases such as running fully embedded that are literally impossible in almost any other DB, let alone the small subset of those DBs that support vector searching, so such support would make it stand out even further. It's a killer feature for me, for instance, since the embeddability is essential for offline-first and/or cross-platform apps/PWAs, and there's nothing else that offers this combination. So, my take is that it would be quite a missed opportunity and thus sad if this type of functionality wasn't addressed, given that SurrealDB is otherwise such a complete sweep of a solution. |
Beta Was this translation helpful? Give feedback.
-
Wondering what the folks in this thread think of the new features in SurrealDB? 💭 |
Beta Was this translation helpful? Give feedback.
-
Hi friends.
Firstly, SurrealDB looks really interesting, well put together and well documented. Its clear a lot of effort has gone into it.
I wanted to see if some machine learning features made sense within SurrealDB, as its existing suite of features make it nearly perfect for ML semantic search; especially considering geo-coordinates are included / planned.
Some features that would be really interesting:
High dimensional vectors as a native type. Typically embeddings can be anywhere from 128 dimensional float vectors to 10k + length vectors. Being able have multiple vector columns would be very helpful for enterprise products as many DB's assume one vector per table, which is troublesome.
Approximate nearest neighbor search. Using some sort of ANN engine like HNSW would allow for fast indexes to be built for approximate nearest neighbor recall of features 'near' an exemplar using some distance metric like Euclidean or Cosine similarity. This is useful for semantic search, for image matching / similar image finding, and much more.
Pre filtering / Hybrid search. The ability to pre-filter semantic / vector search using more traditional
where
clauses. This is where things get interesting and allow for powerful queries that leverage tabular data and vector semantic sorting. The leaders in Hybrid Vector DBs like Weaviate / Milvus do pre-sorting before dispatching / using the ANN engine to guarantee relevant results (post filtering is tricky and has lots of pitfalls).Im aware this is a BROAD discussion and possibly out of scope, but I wanted to introduce the idea. Also note, Im brand new to SurrealDB and have no clue how realistic the above requests are, but I will say this is where things are going!
Thank you.
Beta Was this translation helpful? Give feedback.
All reactions