Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question]: filtering by metadata, what is first? #1689

Open
1 task done
valenradovich opened this issue Feb 4, 2025 · 7 comments
Open
1 task done

[Question]: filtering by metadata, what is first? #1689

valenradovich opened this issue Feb 4, 2025 · 7 comments
Assignees

Comments

@valenradovich
Copy link

valenradovich commented Feb 4, 2025

File Name

vertexAI and Vector Search

What happened?

This is just a question but is crucial for time inference and efficiency in my company.

We're using Vector Search and filtering to retrieve the rigth samples that we need. But, for efficiency and inference speed, I would like to know how is the retriever working with filtering under the hood.

(1) Is it first doing the similarity search and then filtering the outputs by the metadata that we chose? Or (2) it's first doing the filtering and then searching by similarity just for that metadata?

What I mean with this is; in case (1) it wont do the search directly in all the chunks related to the metadata that I want (let's call it a user)

Sorry if this question should not be here! But I'm looking for answers in all the internet and I cannot find it

Relevant log output

Code of Conduct

  • I agree to follow this project's Code of Conduct
@holtskinner
Copy link
Collaborator

@kazunori279 Can you take a look at this?

@kazunori279
Copy link
Contributor

Hi @valenradovich , the answer is (2), pre-filtering with a very fast algorithm.

@valenradovich
Copy link
Author

hi @kazunori279
Thank you for your answer. Is there a way that I can showcase that this is indeed working like this to my team? How can I do it? Because, as we didn't found documentation about it, we would like to really understand how it's working.

Again, thank you very much Kaz!

@kazunori279
Copy link
Contributor

Do you mean how to use filtering with Vector Search? You can refer to:

The first page explains how to build an index for using filtering, and the second page explains how to use the filter on query. Please let me know if there's any question!

@valenradovich
Copy link
Author

Great, I've seen that documentation and we've done some testing with that. What the team wants to know is if there is documentation that explicitly says how the algorithm is working and that it's actually doing the filtering first and then the similarity search. Do you know if there is something like that?

Thank you! @kazunori279

@kazunori279
Copy link
Contributor

@valenradovich unfortunately there's no documentation explains that in detail, and I'm not sure if I can disclose it in detail.. but the filtering uses pretty fast (like O(1)) so please let us know if you're seeing significant latency when you apply the filtering.

@valenradovich
Copy link
Author

@kazunori279 I would like to find a way to show my team that the vector retrieval is indeed applying first the metadata filtering and then the semantic search. Is there a way to do something like that? Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants