Skip to content

huggingface/ember

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Ember - ANE accelerated embedding models

Ember offers GPU and ANE accelerated embedding models with a convenient server!

Ember works by converting sentence-transformers models to Core ML, then launching a local server you can query to retrieve document embeddings. You can select from a few recommended models, or choose from any of the ones available in Hugging Face.

Sentence transformers models generate representations (called embeddings) from documents, which you can use for tasks such as semantic search, similarity, retrieval or clustering. For more information, please refer to the documentation.

Getting Started

Follow the drop down menu instructions with ember create to get started:

ember create

Generate displays a dropdown menu, allowing you to select one of a few popular sentence-transformers models. There's also a Custom Model option where you can choose your desired sentence transformers model from Hugging Face. The selected model will be automatically downloaded and converted to Core ML for you.

You could select intfloat/multilingual-e5-small as the model to create for example.

You should now have a CoreML model. You can serve this on a local server using:

ember serve

The server provides an endpoint for any of the models you converted with ember generate. To query the model we just converted in our previous example, you can use a POST request like the following:

curl http://localhost:11434/api/embed \
  -H "Content-Type: application/json" \
  -d '{
    "model": "intfloat/multilingual-e5-small",
    "documents": [
      { "content": "Hello, world!" },
      { "content": "Open source for the win 🤗!" }
    ],
    "options": {
      "keep_alive": 1
    }
  }'

This example will return embeddings for the documents you supplied, which in this case are the sentences Hello, world! and Open source for the win 🤗!. The model process will stay running until the keep_alive duration is reached (1 minute in this case), so subsequent requests will be processed fast.

Examples

  • similarity.py Simple similarity computation between a query and a set of documents.

About

ANE accelerated embedding models!

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages