Ember - ANE accelerated embedding models

Ember offers GPU and ANE accelerated embedding models with a convenient server!

Ember works by converting sentence-transformers models to Core ML, then launching a local server you can query to retrieve document embeddings. You can select from a few recommended models, or choose from any of the ones available in Hugging Face.

Sentence transformers models generate representations (called embeddings) from documents, which you can use for tasks such as semantic search, similarity, retrieval or clustering. For more information, please refer to the documentation.

Getting Started

Follow the drop down menu instructions with ember create to get started:

ember create

Generate displays a dropdown menu, allowing you to select one of a few popular sentence-transformers models. There's also a Custom Model option where you can choose your desired sentence transformers model from Hugging Face. The selected model will be automatically downloaded and converted to Core ML for you.

You could select intfloat/multilingual-e5-small as the model to create for example.

You should now have a CoreML model. You can serve this on a local server using:

ember serve

The server provides an endpoint for any of the models you converted with ember generate. To query the model we just converted in our previous example, you can use a POST request like the following:

curl http://localhost:11434/api/embed \
  -H "Content-Type: application/json" \
  -d '{
    "model": "intfloat/multilingual-e5-small",
    "documents": [
      { "content": "Hello, world!" },
      { "content": "Open source for the win 🤗!" }
    ],
    "options": {
      "keep_alive": 1
    }
  }'

This example will return embeddings for the documents you supplied, which in this case are the sentences Hello, world! and Open source for the win 🤗!. The model process will stay running until the keep_alive duration is reached (1 minute in this case), so subsequent requests will be processed fast.

Examples

similarity.py Simple similarity computation between a query and a set of documents.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
ember		ember
examples		examples
.gitignore		.gitignore
README.md		README.md
benchmark_results.txt		benchmark_results.txt
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ember - ANE accelerated embedding models

Getting Started

Examples

About

Releases

Packages

Contributors 3

Languages

huggingface/ember

Folders and files

Latest commit

History

Repository files navigation

Ember - ANE accelerated embedding models

Getting Started

Examples

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages