-
Notifications
You must be signed in to change notification settings - Fork 520
Home
Welcome to SimCSE Wiki!
Python package SimCSE is a sentence embedding tool that allows you to easily encode sentences into dense representations, build index for large corpora, and search semantically-similar sentences from the index. It is built upon our state-of-the-art sentence embedding model SimCSE: Simple Contrastive Learning of Sentence Embeddings. In this Wiki, we will show you how to use the package. Navigate it using the sidebar. In this page, we will show you the basic usage of the package.
First install the simcse
package from pypi
pip install simcse
Or directly install it from our code
python setup.py install
Note that if you want to enable GPU encoding, you should install the correct version of PyTorch that supports CUDA. See PyTorch official website for instructions.
After installing the package, you can load our model by just two lines of code
from simcse import SimCSE
model = SimCSE("princeton-nlp/sup-simcse-bert-base-uncased")
See model list for a full list of available models.
Then you can use our model for encoding sentences into embeddings
embeddings = model.encode("A woman is reading.")
Compute the cosine similarities between two groups of sentences
sentences_a = ['A woman is reading.', 'A man is playing a guitar.']
sentences_b = ['He plays guitar.', 'A woman is making a photo.']
similarities = model.similarity(sentences_a, sentences_b)
Or build index for a group of sentences and search among them
sentences = ['A woman is reading.', 'A man is playing a guitar.']
model.build_index(sentences)
results = model.search("He plays guitar.")
We also support faiss, an efficient similarity search library. Just install the package following instructions here and simcse
will automatically use faiss
for efficient search.
WARNING: We have found that faiss
did not well support Nvidia AMPERE GPUs (3090 and A100). In that case, you should change to other GPUs or install the CPU version of faiss
package.