Problems with using start_rultisprocess_pool() #2955

safwaqf · 2024-09-24T07:12:48Z

Why do I encounter a situation where the sentence list does not match the encoding list when I use start_rultisprocess_pool() to start the process pool and then start Python multithreading
eg:
batchNum：1 queLen: 100, embLen: 98
batchNum：2 queLen: 100, embLen: 102
batchNum：3 queLen: 100, embLen: 102
batchNum：4 queLen: 100, embLen: 98
You can see that I output the sentence list length and encoding list length for four batches. Why did my first batch encode 2 sentences less, and the two sentences that were encoded less went to the second batch. Similarly, the third batch encoded two extra sentences, and the two extra encoded sentences ran to the fourth batch.

tomaarsen · 2024-09-25T10:04:49Z

Hello!

Do you start the Python multithreading yourself? That shouldn't be needed.
There's normally just 1 queue, and each process will continuously pop from that shared queue until it's empty. These processes will then also push to 1 shared output queue. This queue is sorted afterwards to ensure that we have the same order as the inputs, but we still have just 1 output queue.

So, the usage is:

from sentence_transformers import SentenceTransformer

def main():
    model = SentenceTransformer("all-mpnet-base-v2")
    sentences = ["The weather is so nice!", "It's so sunny outside.", "He's driving to the movie theater.", "She's going to the cinema."] * 1000

    pool = model.start_multi_process_pool()
    embeddings = model.encode_multi_process(sentences, pool)
    model.stop_multi_process_pool(pool)

    print(embeddings.shape)
    # => (4000, 768)

if __name__ == "__main__":
    main()

https://sbert.net/docs/package_reference/sentence_transformer/SentenceTransformer.html?highlight=multi_process#sentence_transformers.SentenceTransformer.encode_multi_process

Tom Aarsen

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problems with using start_rultisprocess_pool() #2955

Problems with using start_rultisprocess_pool() #2955

safwaqf commented Sep 24, 2024

tomaarsen commented Sep 25, 2024

Problems with using start_rultisprocess_pool() #2955

Problems with using start_rultisprocess_pool() #2955

Comments

safwaqf commented Sep 24, 2024

tomaarsen commented Sep 25, 2024