does bert_encode() thread-safe for online embedding? #11

WayneCao · 2024-02-18T02:25:57Z

No description provided.

WayneCao · 2024-02-18T02:29:43Z

I found that different invocation shares same memory buffer in bert_context, it may not be thread-safe for online-embedding situation

iamlemec · 2024-02-19T03:39:15Z

Yup, that seems right. Good news is that we got merged into llama.cpp, which has multi-threading support. Check it out over there!

WayneCao · 2024-02-21T02:42:46Z

Yup, that seems right. Good news is that we got merged into llama.cpp, which has multi-threading support. Check it out over there!

Can you help explain the implementation mechanism？

iamlemec · 2024-02-21T22:42:33Z

Sure! The major difference from this one is the way that batching works. Here we have explicit batch sizes for each sequence, and so we need to pad them to alignment. In the llama.cpp implemenation, batches are essentially lists of (sequence_id, position, token_id) pairs, so you can put multiple sequences in one batch without padding, which can be really good for uneven length settings. The bulk of the new code there is in llama.cpp:build_bert() if you want to go into more detail.

Is that what you were looking for? Happy to provide more specifics.

WayneCao · 2024-02-22T02:05:09Z

Sure! The major difference from this one is the way that batching works. Here we have explicit batch sizes for each sequence, and so we need to pad them to alignment. In the llama.cpp implemenation, batches are essentially lists of (sequence_id, position, token_id) pairs, so you can put multiple sequences in one batch without padding, which can be really good for uneven length settings. The bulk of the new code there is in llama.cpp:build_bert() if you want to go into more detail.

thank you so much! This seems only support multi-thread in batch?
Let me briefly state my question. I want to wrapper llama.cpp into a online-embedding service, when concurrent client request comes, llama.cpp:build_bert() seems not thread-safe between different invocations, i haven‘t figured out how to guarantee the memory-safety in llama_context for different invocations and found any read-write lock around build_bert?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

does bert_encode() thread-safe for online embedding? #11

does bert_encode() thread-safe for online embedding? #11

WayneCao commented Feb 18, 2024

WayneCao commented Feb 18, 2024

iamlemec commented Feb 19, 2024

WayneCao commented Feb 21, 2024

iamlemec commented Feb 21, 2024

WayneCao commented Feb 22, 2024

does bert_encode() thread-safe for online embedding? #11

does bert_encode() thread-safe for online embedding? #11

Comments

WayneCao commented Feb 18, 2024

WayneCao commented Feb 18, 2024

iamlemec commented Feb 19, 2024

WayneCao commented Feb 21, 2024

iamlemec commented Feb 21, 2024

WayneCao commented Feb 22, 2024