Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incompatible with the official spark-pinecone connector #71

Open
mdagost opened this issue Jan 28, 2024 · 1 comment
Open

Incompatible with the official spark-pinecone connector #71

mdagost opened this issue Jan 28, 2024 · 1 comment

Comments

@mdagost
Copy link

mdagost commented Jan 28, 2024

From here, the indices array in pinecone-text is a 32-bit unsigned integer. However, the sparse vectors in the official pinecone connector (see the README here) are expected to be Spark IntegerType. Spark's integers are 32-bit signed. That means that pinecone-text produces indices which overflow Spark's integer type and therefore are incompatible with the pinecone spark connector. I've verified this.

Any ideas on what to do here? A solution might be for spark-pinecone to change that schema from IntegerType to LongType, but since these are both official Pinecone projects figured y'all might have better success getting that change made.

@mdagost
Copy link
Author

mdagost commented Jan 30, 2024

Looks like the murmurhash function used here has a flag for signed / unsigned. I had thought that a potential solution would be to make that a configurable flag here, but it appears that pinecone itself really is expecting an unsigned integer:

HTTP response body: vectors[0].sparse_values.indices[5]: invalid value -74040069 for type TYPE_UINT32

So looks like the solution has to be a fix on the spark-pinecone side to change from IntegerType to LongType.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant