Skip to content
This repository has been archived by the owner on May 10, 2024. It is now read-only.

Integrations #194

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
2 changes: 2 additions & 0 deletions docs/embeddings.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@ Chroma provides lightweight wrappers around popular embedding providers, making

We welcome pull requests to add new Embedding Functions to the community.

[➕ Add New](/embeddings/add)

***

## Default: all-MiniLM-L6-v2
Expand Down
41 changes: 41 additions & 0 deletions docs/embeddings/add.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
---
title: ➕ Add
---

New embeddings integrations help Chroma users build better applications, faster!

### Process

To land a new embedding integration, you will want to add the integrations to the Python and JS clients.
- [Python embedding functions](https://github.com/chroma-core/chroma/blob/12785d71ea476ef3cbd28b419e7807bf7f2129d3/chromadb/utils/embedding_functions.py#L4)
- [JS embedding functions](https://github.com/chroma-core/chroma/tree/main/clients/js/src/embeddings)

We will accept embedding functions with just one or the other in some circumstances, but we really like to keep parity between them.

The process is:
1. Please open a new PR in the [Chroma Repo](https://github.com/chroma-core/chroma).
2. Please open a new PR in the [Docs Repo](https://github.com/chroma-core/docs).
3. Chroma will approve and land the new embedding functions and cut a new Python/JS release
4. The docs PR will merge and be deployed

### Embedding Page

We strongly encourage having the following sections:
- Overview
- Info table on available models and their details
- Simple getting-started usage for Python and JS
- Advanced usage section with examples and links to other resources

The [OpenAI](/embeddings/openai) integration is a great reference!

## Integration Overview and Sidebar

You will also want to add your embedding functions to the embeddings overview page and sidebar.

### Overview page

Add your embedding function to the list here: https://github.com/chroma-core/docs/blob/main/docs/embeddings/index.md

### Sidebar

Add your embedding function to the sidebar: https://github.com/chroma-core/docs/blob/main/sidebars.js
96 changes: 60 additions & 36 deletions docs/embeddings/cohere.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,56 +3,88 @@

# Cohere

import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
Chroma also provides a convenient wrapper around Cohere's embedding API. This embedding function runs remotely on Cohere’s servers, and requires an API key. You can get an API key by signing up for an account at [Cohere](https://dashboard.cohere.ai/welcome/register).

<div class="select-language">Select a language</div>
<div class="data_table"></div>

<Tabs queryString groupId="lang">
<TabItem value="py" label="Python"></TabItem>
<TabItem value="js" label="JavaScript"></TabItem>
</Tabs>
| Models | Input | Dimensionality | Context Size|
|--|--|--|--|--|
|`embed-english-v3.0` | English | 1024 | 512 (recommended) |
|`embed-multilingual-v3.0` | [Full List](https://docs.cohere.com/docs/supported-languages) | 1024 | 512 (recommended) |
|`embed-english-light-v3.0` | English | 384 | 512 (recommended) |
|`embed-multilingual-light-v3.0` | [Full List](https://docs.cohere.com/docs/supported-languages) | 384 | 512 (recommended) |
|`embed-english-v2.0` | English | 4096 | 512 (recommended) |
|`embed-english-light-v2.0` | English | 1024 | 512 (recommended) |
|`embed-multilingual-v2.0` | [Full List](https://docs.cohere.com/docs/supported-languages) | 768 | 512 (recommended) |

Chroma also provides a convenient wrapper around Cohere's embedding API. This embedding function runs remotely on Cohere’s servers, and requires an API key. You can get an API key by signing up for an account at [Cohere](https://dashboard.cohere.ai/welcome/register).

<Tabs queryString groupId="lang" className="hideTabSwitcher">
<TabItem value="py" label="Python">
## Basic Usage

This embedding function relies on the `cohere` python package, which you can install with `pip install cohere`.
### Python

```bash
pip install cohere
```

```python
cohere_ef = embedding_functions.CohereEmbeddingFunction(api_key="YOUR_API_KEY", model_name="large")
cohere_ef(texts=["document1","document2"])

from chromadb.utils import embedding_functions

embedder = embedding_functions.CohereEmbeddingFunction(
api_key="YOUR_API_KEY")

collection = client.create_collection(
name="cohere_ef",
embedding_function=embedder)
```

</TabItem>
<TabItem value="js" label="JavaScript">
### Javascript

```bash
yarn add cohere-ai
```

```javascript
const {CohereEmbeddingFunction} = require('chromadb');
const embedder = new CohereEmbeddingFunction("apiKey")
import { ChromaClient, CohereEmbeddingFunction } from 'chromadb'

// use directly
const embeddings = embedder.generate(["document1","document2"])
const embedder = new CohereEmbeddingFunction({
apiKey: "YOUR_API_KEY"
})

// pass documents to query for .add and .query
const collection = await client.createCollection({name: "name", embeddingFunction: embedder})
const collectionGet = await client.getCollection({name:"name", embeddingFunction: embedder})
const collection = await client.createCollection({
name: "cohere_ef",
embeddingFunction: embedder
})
```

</TabItem>
## Advanced Usage

</Tabs>
### Call directly

By passing the embedding function to a Collection, Chroma handles the embedding of documents and queries for you. However in some cases you may want to generate the embeddings outside and handle them yourself.

#### Python

```python
embeddings = embedder(["document1","document2"])
# [[0.04565250128507614, 0.01611952856183052...], [0.030171213671565056, 0.007690359838306904...]]
```

#### Javascript

```javascript
const embeddings = embedder.generate(["document1","document2"])
// [[0.04565250128507614, 0.01611952856183052...], [0.030171213671565056, 0.007690359838306904...]]
```

### Using a different model

You can pass in an optional `model_name` argument, which lets you choose which Cohere embeddings model to use. By default, Chroma uses `large` model. You can see the available models under `Get embeddings` section [here](https://docs.cohere.ai/reference/embed).


### Multilingual model example

<Tabs queryString groupId="lang" className="hideTabSwitcher">
<TabItem value="py" label="Python">
#### Python

```python
cohere_ef = embedding_functions.CohereEmbeddingFunction(
Expand All @@ -69,11 +101,10 @@ cohere_ef(texts=multilingual_texts)

```

</TabItem>
<TabItem value="js" label="JavaScript">
#### Javascript

```javascript
const {CohereEmbeddingFunction} = require('chromadb');
import { CohereEmbeddingFunction } from 'chromadb'
const embedder = new CohereEmbeddingFunction("apiKey")

multilingual_texts = [ 'Hello from Cohere!', 'مرحبًا من كوهير!',
Expand All @@ -86,11 +117,4 @@ const embeddings = embedder.generate(multilingual_texts)

```


</TabItem>

</Tabs>



For more information on multilingual model you can read [here](https://docs.cohere.ai/docs/multilingual-language-models).
89 changes: 73 additions & 16 deletions docs/embeddings/google-gemini.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,25 +3,24 @@

# Google Generative AI

import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
Chroma also provides a convenient wrapper around Google's embedding API. This embedding function runs remotely on Google servers, and requires an API key. You can get an API key by signing up for an account at [Google MakerSuite](https://makersuite.google.com/).

<div class="select-language">Select a language</div>
<div class="data_table"></div>

<Tabs queryString groupId="lang">
<TabItem value="py" label="Python"></TabItem>
<TabItem value="js" label="JavaScript"></TabItem>
</Tabs>
| Models | Input | Dimensionality |
|--|--|--|--|--|
|`models/embedding-001` | English | 768 |

Chroma provides a convenient wrapper around Google's Generative AI embedding API. This embedding function runs remotely on Google's servers, and requires an API key.
## Basic Usage

You can get an API key by signing up for an account at [Google MakerSuite](https://makersuite.google.com/).

<Tabs queryString groupId="lang" className="hideTabSwitcher">
<TabItem value="py" label="Python">
### Python

This embedding function relies on the `google-generativeai` python package, which you can install with `pip install google-generativeai`.

```bash
pip install google-generativeai
```

```python
# import
import chromadb
Expand All @@ -40,11 +39,14 @@ You can view a more [complete example](https://github.com/chroma-core/chroma/tre

For more info - please visit the [official Google python docs](https://ai.google.dev/tutorials/python_quickstart).

</TabItem>
<TabItem value="js" label="JavaScript">
### Javascript

This embedding function relies on the `@google/generative-ai` npm package, which you can install with `yarn add @google/generative-ai`.

```bash
yarn add @google/generative-ai
```

```javascript
import { ChromaClient, GoogleGenerativeAiEmbeddingFunction } from 'chromadb'
const embedder = new GoogleGenerativeAiEmbeddingFunction({googleApiKey: "<YOUR API KEY>"})
Expand All @@ -61,7 +63,62 @@ You can view a more [complete example using Node](https://github.com/chroma-core

For more info - please visit the [official Google JS docs](https://ai.google.dev/tutorials/node_quickstart).

</TabItem>
## Advanced Usage

### Call directly

By passing the embedding function to a Collection, Chroma handles the embedding of documents and queries for you. However in some cases you may want to generate the embeddings outside and handle them yourself.

#### Python

```python
embeddings = embedder(["document1","document2"])
# [[0.04565250128507614, 0.01611952856183052...], [0.030171213671565056, 0.007690359838306904...]]
```

#### Javascript

```javascript
const embeddings = embedder.generate(["document1","document2"])
// [[0.04565250128507614, 0.01611952856183052...], [0.030171213671565056, 0.007690359838306904...]]
```



### Task Type

Google's Embedding endpoint also accepts a `task_type`/`taskType` parameter. This may boost performance for your specific usage.

<div class="data_table"></div>

| Task Type| Description|
|--|--|
|RETRIEVAL_QUERY | Specifies the given text is a query in a search/retrieval setting.|
|RETRIEVAL_DOCUMENT |Specifies the given text is a document in a search/retrieval setting. Using this task type requires a title.|
|SEMANTIC_SIMILARITY| Specifies the given text will be used for Semantic Textual Similarity (STS).|
|CLASSIFICATION |Specifies that the embeddings will be used for classification.|
|CLUSTERING |Specifies that the embeddings will be used for clustering.|

Here is a python demonstration of how to use `RETRIEVAL_QUERY` with `RETRIEVAL_DOCUMENT`.

```python
# import
import chromadb
from chromadb.utils import embedding_functions

google_ef = embedding_functions.GoogleGenerativeAiEmbeddingFunction(api_key="YOUR_API_KEY", task_type='RETRIEVAL_DOCUMENT')

# pass documents to query for .add and .query
collection = client.create_collection(name="name", embedding_function=google_ef)

# add your documents
collection.add(...)

# create a new EF for Query and re-get your collection
google_ef2 = embedding_functions.GoogleGenerativeAiEmbeddingFunction(api_key="YOUR_API_KEY", task_type='RETRIEVAL_QUERY')
collection = client.get_collection(name="name", embedding_function=google_ef2)

</Tabs>
# query your documents
collection.query(...)

```
Loading