Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HuggingFaceEndpointEmbeddings fails with faiss #29549

Open
5 tasks done
tocab opened this issue Feb 3, 2025 · 0 comments
Open
5 tasks done

HuggingFaceEndpointEmbeddings fails with faiss #29549

tocab opened this issue Feb 3, 2025 · 0 comments
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature investigate Flagged for investigation. Ɑ: vector store Related to vector store module

Comments

@tocab
Copy link

tocab commented Feb 3, 2025

Checked other resources

  • I added a very descriptive title to this issue.
  • I searched the LangChain documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

This code fails:

from langchain_huggingface.embeddings import HuggingFaceEndpointEmbeddings
from langchain_community.vectorstores import FAISS
import numpy as np

hf_embeddings = HuggingFaceEndpointEmbeddings(
    model="deepseek-ai/DeepSeek-R1-Distill-Qwen-32B",
    task="feature-extraction",
    huggingfacehub_api_token=inference_token,
)
vector = FAISS.from_texts(["Hello, how are you?", "I am fine, thank you."], hf_embeddings)

Most likely because of this:

embeddings = hf_embeddings.embed_documents(["Hello, how are you?", "I am fine, thank you."])
np.array(embeddings)

Both returns the error:
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 2 dimensions. The detected shape was (2, 1) + inhomogeneous part.

Error Message and Stack Trace (if applicable)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[5], line 11
      6 hf_embeddings = HuggingFaceEndpointEmbeddings(
      7     model="deepseek-ai/DeepSeek-R1-Distill-Qwen-32B",
      8     task="feature-extraction",
      9     huggingfacehub_api_token=inference_token,
     10 )
---> 11 vector = FAISS.from_texts(["Hello, how are you?", "I am fine, thank you."], hf_embeddings)

/.venv/lib/python3.12/site-packages/langchain_community/vectorstores/faiss.py:1044, in FAISS.from_texts(cls, texts, embedding, metadatas, ids, **kwargs)
   1025 """Construct FAISS wrapper from raw documents.
   1026 
   1027 This is a user friendly interface that:
   (...)
   1041         faiss = FAISS.from_texts(texts, embeddings)
   1042 """
   1043 embeddings = embedding.embed_documents(texts)
-> 1044 return cls.__from(
   1045     texts,
   1046     embeddings,
   1047     embedding,
   1048     metadatas=metadatas,
   1049     ids=ids,
   1050     **kwargs,
   1051 )

/.venv/lib/python3.12/site-packages/langchain_community/vectorstores/faiss.py:1013, in FAISS.__from(cls, texts, embeddings, embedding, metadatas, ids, normalize_L2, distance_strategy, **kwargs)
   1003 index_to_docstore_id = kwargs.pop("index_to_docstore_id", {})
   1004 vecstore = cls(
   1005     embedding,
   1006     index,
   (...)
   1011     **kwargs,
   1012 )
-> 1013 vecstore.__add(texts, embeddings, metadatas=metadatas, ids=ids)
   1014 return vecstore

/.venv/lib/python3.12/site-packages/langchain_community/vectorstores/faiss.py:310, in FAISS.__add(self, texts, embeddings, metadatas, ids)
    308     raise ValueError("Duplicate ids found in the ids list.")
    309 # Add to the index.
--> 310 vector = np.array(embeddings, dtype=np.float32)
    311 if self._normalize_L2:
    312     faiss.normalize_L2(vector)

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 2 dimensions. The detected shape was (2, 1) + inhomogeneous part.

Description

I tried following the quickstart guide with the huggingface api. I don't know if the same error happens with the OpenAI embeddings endpoint, since I don't have API access there.

System Info

System Information
------------------
> OS:  Linux
> OS Version:  #53~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Jan 15 19:18:46 UTC 2
> Python Version:  3.12.7 (main, Oct 20 2024, 13:47:02) [GCC 11.4.0]

Package Information
-------------------
> langchain_core: 0.3.33
> langchain: 0.3.17
> langchain_community: 0.3.16
> langsmith: 0.3.4
> langchain_huggingface: 0.1.2
> langchain_text_splitters: 0.3.5

Optional packages not installed
-------------------------------
> langserve

Other Dependencies
------------------
> aiohttp: 3.11.11
> async-timeout: Installed. No version info available.
> dataclasses-json: 0.6.7
> httpx: 0.27.2
> httpx-sse: 0.4.0
> huggingface-hub: 0.28.1
> jsonpatch: 1.33
> langsmith-pyo3: Installed. No version info available.
> numpy: 2.2.2
> orjson: 3.10.15
> packaging: 24.2
> pydantic: 2.10.6
> pydantic-settings: 2.7.1
> pytest: 8.3.4
> PyYAML: 6.0.2
> requests: 2.32.3
> requests-toolbelt: 1.0.0
> rich: 13.9.4
> sentence-transformers: 3.4.1
> SQLAlchemy: 2.0.37
> tenacity: 9.0.0
> tokenizers: 0.21.0
> transformers: 4.48.2
> typing-extensions: 4.12.2
> zstandard: 0.23.0
@langcarl langcarl bot added the investigate Flagged for investigation. label Feb 3, 2025
@dosubot dosubot bot added Ɑ: vector store Related to vector store module 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature labels Feb 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature investigate Flagged for investigation. Ɑ: vector store Related to vector store module
Projects
None yet
Development

No branches or pull requests

1 participant