BM25 support with Typescript #219

RobertHH-IS · 2024-05-23T10:17:36Z

Is this your first time submitting a feature request?

I have searched the existing issues, and I could not find an existing issue for this feature
I am requesting a straightforward extension of existing functionality

Describe the feature

Maybe I just cannot find it, but we need BM25 encoder support with Typescript to support Hybrid search.
Search for BM25 in issues and code base did not return anything.

Describe alternatives you've considered

No response

Who will this benefit?

No response

Are you interested in contributing this feature?

No response

Anything else?

No response

c464851257 · 2024-06-03T05:18:28Z

We need BM25 encoder support with Typescript to support Hybrid search also, Could you please support it?

max-tano · 2024-08-19T13:44:15Z

+1 please

dylanjt · 2024-08-19T15:59:07Z

+1

harrisonlinowes · 2024-08-22T16:08:11Z

Yes please! This exactly what I need +5

aulorbe · 2024-08-22T16:18:58Z

For right now, we encourage you to use our Python library for generating the sparse vectors (note: it's in beta): https://github.com/pinecone-io/pinecone-text

(But noted re: a TS implementation!)

snlamm · 2024-10-08T11:40:34Z

+1

ladrians · 2024-11-01T09:41:24Z

+1

ladrians · 2024-11-09T11:23:36Z

+1

aulorbe · 2024-11-12T22:02:55Z

Hi, all! Audrey from Pinecone here.

Can you provide some clarification around this feature request and an example use case?

For instance, when you say BM25 encoder support, do you mean you want Pinecone indexes to support searching with BM25 vectors only, BM25 vectors and dense vectors (it currently does this), or something altogether different such as Pinecone generating BM25 sparse vectors through something like its embed service.

Thanks in advance!

BenBrewerBowman · 2024-11-13T00:51:35Z

The problem is there isn't a way to do a hybrid upsert or search on Pinecone in NodeJS. This is primarily due to the bm25 library being only available in python. The real problem is around converting sentences to the sparse vectorization. As a result, you basically have to setup python API endpoint to do the conversion (pass in text, get back sparse vectors, when you could just do it inline. Big blocker to me at the moment for setting up a hybrid solution in my 100% NodeJS env.

I think what would help tremendously is showing an example of how to both populate a hybrid Pinecone index using JS and then also how to query against it in JS. If this isn't possible, I think making this possible is what everyone is asking for.

This exemplifies the problem here:
https://community.pinecone.io/t/sparse-vector-generation-using-node-package-wink-nlp/4986

ladrians · 2024-11-13T09:09:04Z

same issue here, should have a full hybrid option from the ts client.

aulorbe · 2024-11-13T21:58:51Z

Gotcha, thanks for the clarification, y'all. So the crux is in generating sparse vectors via the TS client, so you can upsert and search all in one go, got it.

Re: an example @BenBrewerBowman, there is not currently a way to do this while remaining in the TS client ecosystem (you can only currently use /embed in the TS client to generated dense vectors). However, all the necessary devs have been keeping up to date with this issue-thread, and we are planning to provide an update and guidance soon.

harrisonlinowes · 2024-11-13T22:32:41Z

Hi, I'd echo @BenBrewerBowman comment. Right now my team is managing our own API endpoint which utilizes the python SDK to compute BM25 sparse vectors.

It would be great to see both a hosted BM25 (or equivalent sparse vector model) available through Pinecone, as well as BM25 support via the TS SDK.

BenBrewerBowman · 2024-11-13T23:38:07Z

@aulorbe thanks for letting them know! Just to 100% clarify
“so you can upsert and search all in one go”, this isn’t really the problem trying to upsert and search in one go. Problem is neither one is possible in NodeJS and you only have the option to use Python for all sparse vectorization. There is no BM25 NodeJS support for sparse vectorization at all.

This would be very exciting to have, so keep us updated with timeline! Thanks 🙏

aulorbe · 2024-11-14T18:05:35Z

@BenBrewerBowman I'm a little confused -- right now, you can upsert sparse vectors (BM25-encoded or otherwise) and search them with all of our clients via hybrid search. Is that not what you are trying to do?

The thing users cannot currently do is generate those sparse vectors via the TS client.

BenBrewerBowman · 2024-11-15T05:49:41Z

The thing users cannot currently do is generate those sparse vectors via the TS client.

Yes correct. Others are possible through the clients.

RobertHH-IS added the enhancement New feature or request label May 23, 2024

anawishnoff added the status:needs-triage An issue that needs to be triaged by the Pinecone team label Aug 6, 2024

anawishnoff added status:backlog This issue has been added to our backlog and removed status:needs-triage An issue that needs to be triaged by the Pinecone team labels Oct 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BM25 support with Typescript #219

BM25 support with Typescript #219

RobertHH-IS commented May 23, 2024

c464851257 commented Jun 3, 2024

max-tano commented Aug 19, 2024

dylanjt commented Aug 19, 2024

harrisonlinowes commented Aug 22, 2024

aulorbe commented Aug 22, 2024 •

edited

Loading

snlamm commented Oct 8, 2024

ladrians commented Nov 1, 2024

ladrians commented Nov 9, 2024

aulorbe commented Nov 12, 2024

BenBrewerBowman commented Nov 13, 2024 •

edited

Loading

ladrians commented Nov 13, 2024

aulorbe commented Nov 13, 2024

harrisonlinowes commented Nov 13, 2024

BenBrewerBowman commented Nov 13, 2024

aulorbe commented Nov 14, 2024 •

edited

Loading

BenBrewerBowman commented Nov 15, 2024

BM25 support with Typescript #219

BM25 support with Typescript #219

Comments

RobertHH-IS commented May 23, 2024

Is this your first time submitting a feature request?

Describe the feature

Describe alternatives you've considered

Who will this benefit?

Are you interested in contributing this feature?

Anything else?

c464851257 commented Jun 3, 2024

max-tano commented Aug 19, 2024

dylanjt commented Aug 19, 2024

harrisonlinowes commented Aug 22, 2024

aulorbe commented Aug 22, 2024 • edited Loading

snlamm commented Oct 8, 2024

ladrians commented Nov 1, 2024

ladrians commented Nov 9, 2024

aulorbe commented Nov 12, 2024

BenBrewerBowman commented Nov 13, 2024 • edited Loading

ladrians commented Nov 13, 2024

aulorbe commented Nov 13, 2024

harrisonlinowes commented Nov 13, 2024

BenBrewerBowman commented Nov 13, 2024

aulorbe commented Nov 14, 2024 • edited Loading

BenBrewerBowman commented Nov 15, 2024

aulorbe commented Aug 22, 2024 •

edited

Loading

BenBrewerBowman commented Nov 13, 2024 •

edited

Loading

aulorbe commented Nov 14, 2024 •

edited

Loading