Skip to content

Commit

Permalink
deprecate bm25 embedding model
Browse files Browse the repository at this point in the history
Signed-off-by: liyun95 <[email protected]>
  • Loading branch information
liyun95 committed Dec 6, 2024
1 parent 8ff775f commit eba0372
Show file tree
Hide file tree
Showing 3 changed files with 2 additions and 52 deletions.
2 changes: 1 addition & 1 deletion site/en/about/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ Milvus supports various types of search functions to meet the demands of differe
- [Filtering Search](single-vector-search.md#Filtered-search): Performs ANN search under specified filtering conditions.
- [Range Search](single-vector-search.md#Range-search): Finds vectors within a specified radius from your query vector.
- [Hybrid Search](multi-vector-search.md): Conducts ANN search based on multiple vector fields.
- Keyword Search: Keyword search based on BM25.
- [Full Text Search](full-text-search.md): Full text search based on BM25.
- [Reranking](reranking.md): Adjusts the order of search results based on additional criteria or a secondary algorithm, refining the initial ANN search results.
- [Fetch](get-and-scalar-query.md#Get-Entities-by-ID): Retrieves data by their primary keys.
- [Query](get-and-scalar-query.md#Use-Basic-Operators): Retrieves data using specific expressions.
Expand Down
46 changes: 1 addition & 45 deletions site/en/embeddings/embeddings.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,6 @@ To create embeddings in action, refer to [Using PyMilvus's Model To Generate Tex
| ------------------------------------------------------------------------------------- | ------- | -------------------- |
| [openai](https://milvus.io/api-reference/pymilvus/v2.4.x/EmbeddingModels/OpenAIEmbeddingFunction/OpenAIEmbeddingFunction.md) | Dense | API |
| [sentence-transformer](https://milvus.io/api-reference/pymilvus/v2.4.x/EmbeddingModels/SentenceTransformerEmbeddingFunction/SentenceTransformerEmbeddingFunction.md) | Dense | Open-sourced |
| [bm25](https://milvus.io/api-reference/pymilvus/v2.4.x/EmbeddingModels/BM25EmbeddingFunction/BM25EmbeddingFunction.md) | Sparse | Open-sourced |
| [Splade](https://milvus.io/api-reference/pymilvus/v2.4.x/EmbeddingModels/SpladeEmbeddingFunction/SpladeEmbeddingFunction.md) | Sparse | Open-sourced |
| [bge-m3](https://milvus.io/api-reference/pymilvus/v2.4.x/EmbeddingModels/BGEM3EmbeddingFunction/BGEM3EmbeddingFunction.md) | Hybrid | Open-sourced |
| [voyageai](https://milvus.io/api-reference/pymilvus/v2.4.x/EmbeddingModels/VoyageEmbeddingFunction/VoyageEmbeddingFunction.md) | Dense | API |
Expand All @@ -42,7 +41,7 @@ To use embedding functions with Milvus, first install the PyMilvus client librar
pip install "pymilvus[model]"
```

The `model` subpackage supports various embedding models, from [OpenAI](https://milvus.io/docs/embed-with-openai.md), [Sentence Transformers](https://milvus.io/docs/embed-with-sentence-transform.md), [BGE M3](https://milvus.io/docs/embed-with-bgm-m3.md), [BM25](https://milvus.io/docs/embed-with-bm25.md), to [SPLADE](https://milvus.io/docs/embed-with-splade.md) pretrained models. For simpilicity, this example uses the `DefaultEmbeddingFunction` which is __all-MiniLM-L6-v2__ sentence transformer model, the model is about 70MB and it will be downloaded during first use:
The `model` subpackage supports various embedding models, from [OpenAI](https://milvus.io/docs/embed-with-openai.md), [Sentence Transformers](https://milvus.io/docs/embed-with-sentence-transform.md), [BGE M3](https://milvus.io/docs/embed-with-bgm-m3.md), to [SPLADE](https://milvus.io/docs/embed-with-splade.md) pretrained models. For simpilicity, this example uses the `DefaultEmbeddingFunction` which is __all-MiniLM-L6-v2__ sentence transformer model, the model is about 70MB and it will be downloaded during first use:

```python
from pymilvus import model
Expand Down Expand Up @@ -121,46 +120,3 @@ bge_m3_ef = BGEM3EmbeddingFunction(use_fp16=False, device="cpu")
docs_embeddings = bge_m3_ef(docs)
query_embeddings = bge_m3_ef([query])
```

## Example 3: Generate sparse vectors using BM25 model

BM25 is a well-known method that uses word occurrence frequencies to determine the relevance between queries and documents. In this example, we will show how to use `BM25EmbeddingFunction` to generate sparse embeddings for both queries and documents.

First, import the __BM25EmbeddingFunction__ class.

```xml
from pymilvus.model.sparse import BM25EmbeddingFunction
```

In BM25, it's important to calculate the statistics in your documents to obtain the IDF (Inverse Document Frequency), which can represent the pattern in your documents. The IDF is a measure of how much information a word provides, that is, whether it's common or rare across all documents.

```python
# 1. prepare a small corpus to search
docs = [
"Artificial intelligence was founded as an academic discipline in 1956.",
"Alan Turing was the first person to conduct substantial research in AI.",
"Born in Maida Vale, London, Turing was raised in southern England.",
]
query = "Where was Turing born?"
bm25_ef = BM25EmbeddingFunction()

# 2. fit the corpus to get BM25 model parameters on your documents.
bm25_ef.fit(docs)

# 3. store the fitted parameters to disk to expedite future processing.
bm25_ef.save("bm25_params.json")

# 4. load the saved params
new_bm25_ef = BM25EmbeddingFunction()
new_bm25_ef.load("bm25_params.json")

docs_embeddings = new_bm25_ef.encode_documents(docs)
query_embeddings = new_bm25_ef.encode_queries([query])
print("Dim:", new_bm25_ef.dim, list(docs_embeddings)[0].shape)
```

The expected output is similar to the following:

```python
Dim: 21 (1, 21)
```
6 changes: 0 additions & 6 deletions site/en/menuStructure/en.json
Original file line number Diff line number Diff line change
Expand Up @@ -777,12 +777,6 @@
"order": 3,
"children": []
},
{
"label": "BM25",
"id": "embed-with-bm25.md",
"order": 4,
"children": []
},
{
"label": "SPLADE",
"id": "embed-with-splade.md",
Expand Down

0 comments on commit eba0372

Please sign in to comment.