Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BM25 support with Typescript #219

Open
1 of 2 tasks
RobertHH-IS opened this issue May 23, 2024 · 16 comments
Open
1 of 2 tasks

BM25 support with Typescript #219

RobertHH-IS opened this issue May 23, 2024 · 16 comments
Labels
enhancement New feature or request status:backlog This issue has been added to our backlog

Comments

@RobertHH-IS
Copy link

Is this your first time submitting a feature request?

  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing functionality

Describe the feature

Maybe I just cannot find it, but we need BM25 encoder support with Typescript to support Hybrid search.
Search for BM25 in issues and code base did not return anything.

Describe alternatives you've considered

No response

Who will this benefit?

No response

Are you interested in contributing this feature?

No response

Anything else?

No response

@RobertHH-IS RobertHH-IS added the enhancement New feature or request label May 23, 2024
@c464851257
Copy link

We need BM25 encoder support with Typescript to support Hybrid search also, Could you please support it?

@anawishnoff anawishnoff added the status:needs-triage An issue that needs to be triaged by the Pinecone team label Aug 6, 2024
@max-tano
Copy link

+1 please

@dylanjt
Copy link

dylanjt commented Aug 19, 2024

+1

@harrisonlinowes
Copy link

Yes please! This exactly what I need +5

@aulorbe
Copy link
Collaborator

aulorbe commented Aug 22, 2024

For right now, we encourage you to use our Python library for generating the sparse vectors (note: it's in beta): https://github.com/pinecone-io/pinecone-text

(But noted re: a TS implementation!)

@anawishnoff anawishnoff added status:backlog This issue has been added to our backlog and removed status:needs-triage An issue that needs to be triaged by the Pinecone team labels Oct 1, 2024
@snlamm
Copy link

snlamm commented Oct 8, 2024

+1

2 similar comments
@ladrians
Copy link

ladrians commented Nov 1, 2024

+1

@ladrians
Copy link

ladrians commented Nov 9, 2024

+1

@aulorbe
Copy link
Collaborator

aulorbe commented Nov 12, 2024

Hi, all! Audrey from Pinecone here.

Can you provide some clarification around this feature request and an example use case?

For instance, when you say BM25 encoder support, do you mean you want Pinecone indexes to support searching with BM25 vectors only, BM25 vectors and dense vectors (it currently does this), or something altogether different such as Pinecone generating BM25 sparse vectors through something like its embed service.

Thanks in advance!

@BenBrewerBowman
Copy link

BenBrewerBowman commented Nov 13, 2024

The problem is there isn't a way to do a hybrid upsert or search on Pinecone in NodeJS. This is primarily due to the bm25 library being only available in python. The real problem is around converting sentences to the sparse vectorization. As a result, you basically have to setup python API endpoint to do the conversion (pass in text, get back sparse vectors, when you could just do it inline. Big blocker to me at the moment for setting up a hybrid solution in my 100% NodeJS env.

I think what would help tremendously is showing an example of how to both populate a hybrid Pinecone index using JS and then also how to query against it in JS. If this isn't possible, I think making this possible is what everyone is asking for.

This exemplifies the problem here:
https://community.pinecone.io/t/sparse-vector-generation-using-node-package-wink-nlp/4986

@ladrians
Copy link

same issue here, should have a full hybrid option from the ts client.

@aulorbe
Copy link
Collaborator

aulorbe commented Nov 13, 2024

Gotcha, thanks for the clarification, y'all. So the crux is in generating sparse vectors via the TS client, so you can upsert and search all in one go, got it.

Re: an example @BenBrewerBowman, there is not currently a way to do this while remaining in the TS client ecosystem (you can only currently use /embed in the TS client to generated dense vectors). However, all the necessary devs have been keeping up to date with this issue-thread, and we are planning to provide an update and guidance soon.

@harrisonlinowes
Copy link

Hi, I'd echo @BenBrewerBowman comment. Right now my team is managing our own API endpoint which utilizes the python SDK to compute BM25 sparse vectors.

It would be great to see both a hosted BM25 (or equivalent sparse vector model) available through Pinecone, as well as BM25 support via the TS SDK.

@BenBrewerBowman
Copy link

@aulorbe thanks for letting them know! Just to 100% clarify
“so you can upsert and search all in one go”, this isn’t really the problem trying to upsert and search in one go. Problem is neither one is possible in NodeJS and you only have the option to use Python for all sparse vectorization. There is no BM25 NodeJS support for sparse vectorization at all.

This would be very exciting to have, so keep us updated with timeline! Thanks 🙏

@aulorbe
Copy link
Collaborator

aulorbe commented Nov 14, 2024

@BenBrewerBowman I'm a little confused -- right now, you can upsert sparse vectors (BM25-encoded or otherwise) and search them with all of our clients via hybrid search. Is that not what you are trying to do?

The thing users cannot currently do is generate those sparse vectors via the TS client.

@BenBrewerBowman
Copy link

The thing users cannot currently do is generate those sparse vectors via the TS client.

Yes correct. Others are possible through the clients.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request status:backlog This issue has been added to our backlog
Projects
None yet
Development

No branches or pull requests

10 participants