-
Notifications
You must be signed in to change notification settings - Fork 56
Adding support for FAISS #225
Comments
We are in plans to support FAISS. Nothing concrete yet. Will keep this thread open as a Feature request. Based on the community feedback we could prioritize the possibility of this feature. Those who end up on this thread looking for FAISS support please +1 this thread. |
+1 |
+1 |
2 similar comments
+1 |
+1 |
+1 |
As an update, we are working to add faiss support to the plugin. We recently received a contribution to add the library and its HNSW implementation. Because we do not see improvement with faiss's HNSW versus nmslib's, we have decided to incorporate other faiss methods before releasing. We will build off of that contribution in faiss-support branch. We are looking into adding functionality for inverted file systems, product quantization, as well as composite indices. Because these methods require training, the implementation is a little more complex. In the coming weeks, we will publish an RFC. In the meantime, please feel free to "+1" or mention a specific feature from faiss you would like to have supported. |
+1 |
+1, As ml-supervised-workflow shows. may be we can use some workflow in faiss training |
@luyuncheng That is a Elastic commercial feature, so we cannot use that. I am exploring a couple approaches to training. First, adding a training step in the SaveIndex jni function that takes a subset of the vectors that will be indexed and uses them for training. This approach has several flaws including
I am working on the mapping interface to support faiss's composite indices, so I implemented this approach to be able to create trained faiss indices to test the interface. As a second approach, I am going to explore adding a "train" api. In this approach, a user would create an Elasticsearch faiss index, and then they would also create a separate Elasticsearch index containing the training data. When they call the "train" api, it would create a faiss library index based on the configuration of the Elasticsearch faiss index, and then train the faiss library index with data from the training index, and then serialize the faiss library index in an Elasticsearch system index. Then, when a user starts to ingest data, during segment creation, instead of creating a new, untrained index from faiss's index factory, it would create a copy of the empty, trained index from the faiss library index stored in the Elasticsearch system index. This way, training would only incur a one time cost when the train api is called, and thus speed up segment creation. Additionally, if all segments use the same trained models, it would be easier to perform segment merges without relying on storing the raw vectors in Lucene. But I have not explored this in detail yet. I would appreciate any feedback on either of these approaches and any other different approaches that might be worth considering. |
LGTM, i am wondering the data to be trained stored in the same index or separate into 2 indices |
@luyuncheng My thinking on having a separate index is that it will be easier to delete. I think in theory, you could use the same index with this approach. This train API will require an index and a field in order to gather the training data. The index could be the same as the one being trained, but would be a separate field. |
Do you guys have any plans to support faiss other than nmslib in future?
Few issues I have encountered while using nmslib is,
Since faiss has a significant solutions to handle these issues, I would be happy to have both of them integrated into this plugin.
Attaching an ES issue thread that you might be interested in.
I have also created Java bindings for faiss which can be found here.
Hope it helps
The text was updated successfully, but these errors were encountered: