Cosine Similarity Score > 1. #17569

JpChii · 2025-01-21T09:54:53Z

JpChii
Jan 21, 2025

Querying VectorStoreIndex with VectoStoreQuery with DEFAULT mode uses CosineSimilarity Score. But scores returned on query is more than 1? Is this cosine distance instead?

Index: FaissHNSW
Embedding Model: Alibaba-NLP/gte-Qwen2-1.5B-instruct
EmbeddingType: HuggingFace

Code:

# Get query emebdding
query_embedding = embed_model.get_query_embedding("What did the author do growing `up?")`
vector_store_query = VectorStoreQuery(
    query_embedding=query_embedding,
    mode="default",
    similarity_top_k=5,

)
top_k_results = vector_store.query(vector_store_query)

VectorStoreQueryResult(nodes=None, similarities=[1.123214, 1.1505811, 1.1521511, 1.1726012, 1.1901894], ids=['8', '4', '12', '11', '10'])

Answered by dosubot[bot]

Jan 21, 2025

Hello, @JpChii! I'm here to help you with any bugs, questions, or contributions. Let's get to the bottom of this together!

The cosine similarity score should theoretically range between -1 and 1. If you're seeing scores greater than 1, it might indicate an issue with the implementation or the way the scores are being calculated or interpreted.

In the context of the VectorStoreQuery with DEFAULT mode, the similarity scores are calculated using the Atlas Vector Search with a cosine similarity metric ^[1]. However, if you're using the FaissHNSW index, it's possible that the scores are being calculated differently or that there is a bug in the implementation.

Here are a few things you might w…

View full answer

@JpChii · 2025-01-21T09:56:41Z

dosubot[bot]
bot Jan 21, 2025

Hello, @JpChii! I'm here to help you with any bugs, questions, or contributions. Let's get to the bottom of this together!

The cosine similarity score should theoretically range between -1 and 1. If you're seeing scores greater than 1, it might indicate an issue with the implementation or the way the scores are being calculated or interpreted.

In the context of the VectorStoreQuery with DEFAULT mode, the similarity scores are calculated using the Atlas Vector Search with a cosine similarity metric ^[1]. However, if you're using the FaissHNSW index, it's possible that the scores are being calculated differently or that there is a bug in the implementation.

Here are a few things you might want to check or consider:

Normalization: Ensure that the embeddings are normalized before calculating cosine similarity. This is crucial for cosine similarity to be correctly bounded between -1 and 1.
Implementation Details: Review the implementation details of the FaissHNSW index to ensure that it is configured correctly for cosine similarity. There might be specific parameters or settings that need to be adjusted.
Library or Version Issues: Check if there are any known issues with the version of the libraries you are using, particularly with Faiss or the embedding model.
Debugging: Add logging or debugging statements to trace how the similarity scores are being computed and verify each step of the process.

If none of these steps resolve the issue, it might be helpful to review the specific configuration and setup of your vector store and embedding model to ensure everything is aligned correctly.

To continue talking to Dosu, mention @dosu.

^{Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other}

1 reply

JpChii Jan 24, 2025
Author

Yep, the scores seems to be consistent with FlatL2 Faiss Index and this might be a case of HNSW.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cosine Similarity Score > 1. #17569

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Cosine Similarity Score > 1. #17569

JpChii Jan 21, 2025

Replies: 1 comment · 1 reply

dosubot[bot] bot Jan 21, 2025

JpChii Jan 24, 2025 Author

JpChii
Jan 21, 2025

Replies: 1 comment 1 reply

dosubot[bot]
bot Jan 21, 2025

JpChii Jan 24, 2025
Author