You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Why do I encounter a situation where the sentence list does not match the encoding list when I use start_rultisprocess_pool() to start the process pool and then start Python multithreading
eg:
batchNum:1 queLen: 100, embLen: 98
batchNum:2 queLen: 100, embLen: 102
batchNum:3 queLen: 100, embLen: 102
batchNum:4 queLen: 100, embLen: 98
You can see that I output the sentence list length and encoding list length for four batches. Why did my first batch encode 2 sentences less, and the two sentences that were encoded less went to the second batch. Similarly, the third batch encoded two extra sentences, and the two extra encoded sentences ran to the fourth batch.
The text was updated successfully, but these errors were encountered:
Do you start the Python multithreading yourself? That shouldn't be needed.
There's normally just 1 queue, and each process will continuously pop from that shared queue until it's empty. These processes will then also push to 1 shared output queue. This queue is sorted afterwards to ensure that we have the same order as the inputs, but we still have just 1 output queue.
So, the usage is:
fromsentence_transformersimportSentenceTransformerdefmain():
model=SentenceTransformer("all-mpnet-base-v2")
sentences= ["The weather is so nice!", "It's so sunny outside.", "He's driving to the movie theater.", "She's going to the cinema."] *1000pool=model.start_multi_process_pool()
embeddings=model.encode_multi_process(sentences, pool)
model.stop_multi_process_pool(pool)
print(embeddings.shape)
# => (4000, 768)if__name__=="__main__":
main()
Why do I encounter a situation where the sentence list does not match the encoding list when I use start_rultisprocess_pool() to start the process pool and then start Python multithreading
eg:
batchNum:1 queLen: 100, embLen: 98
batchNum:2 queLen: 100, embLen: 102
batchNum:3 queLen: 100, embLen: 102
batchNum:4 queLen: 100, embLen: 98
You can see that I output the sentence list length and encoding list length for four batches. Why did my first batch encode 2 sentences less, and the two sentences that were encoded less went to the second batch. Similarly, the third batch encoded two extra sentences, and the two extra encoded sentences ran to the fourth batch.
The text was updated successfully, but these errors were encountered: