-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Parallel Embedding is not working on Windows Servers #414
Comments
hi @abdelkareemkobo, parallel=4 does not span all the data across all available gpu by default, you need to initialize your model with LateInteractionTextEmbedding(
model_name=model_name,
cuda=args.use_cuda,
device_ids=device_ids,
lazy_load=lazy_load
) |
Thanks @joein but I'm running in the same issue but in ubuntu. I'm not able to select the cuda devices to run indexing. It is not clear in the docs how to do indexing on multiple gpus on the same machine. Here is snippet to reproduce using python 3.12 import time
from dataclasses import dataclass
from typing import Any
import os
from qdrant_client import QdrantClient
from datasets import load_dataset
from fastembed import TextEmbedding
@dataclass
class CollectionItem:
text: str
metadata: dict[str, Any] = None
def __post_init__(self):
if self.metadata is None:
self.metadata = {'text': self.text}
@dataclass
class CollectionItemPool:
items: list[CollectionItem]
docs: list[str] = None
metadata: list[dict] = None
def __post_init__(self):
if self.docs is None:
self.docs = [i.text for i in self.items]
if self.metadata is None:
self.metadata = [i.metadata for i in self.items]
def prepare_dataset(limit: int = None) -> CollectionItemPool:
en_ds = load_dataset("allenai/c4", "en", split='train', streaming=True)
if limit is not None:
assert isinstance(limit, int), (
f'`limit` has to be integer got {type(limit)}')
ds = en_ds
# ds = en_ds.select(range(limit))
items: CollectionItem = []
for idx, ds_item in enumerate(ds):
if idx == limit:
break
item = CollectionItem(text=ds_item['text'])
items.append(item)
return CollectionItemPool(items=items)
return None
if __name__ == '__main__':
# setting cuda devices
# os.environ["CUDA_VISIBLE_DEVICES"] = "1,2"
# Initialize the client
client = QdrantClient(":memory:") # or QdrantClient(path="path/to/db")
embedding_model_gpu = TextEmbedding(
model_name="intfloat/multilingual-e5-large",
providers=["CUDAExecutionProvider"],
device_ids=[1, 2, 3],
cuda=True,
lazy_load=True
# model_name="BAAI/bge-big-en-v1.5", providers=["CUDAExecutionProvider"]
)
print('Base Class')
print(embedding_model_gpu.__class__.__bases__)
# print(embedding_model_gpu.model.model.get_providers())
print('Done loading embedding model on GPU')
print('Loading Dataset')
items_pool = prepare_dataset(limit=1024)
print('Done loading dataset')
start_idx_time = time.time()
print('Start Indexing ..')
# every embedding is numpy array oject
embeds = embedding_model_gpu.embed(items_pool.docs, batch_size=256)
end_idx_time = time.time()
for embed in embeds:
print(type(embed))
print(embed.shape)
# print(embed) # numpy array
break
print(f'End Indexing in {end_idx_time - start_idx_time:4f}') Here I set Failed to allocate memory for requested buffer of size 17179869184 |
@Abdullahaml1 Please ensure that the parallel argument in .embed is == len(device_ids). In your example its 3. |
What happened?
I am trying to encode my dataset with multiple CUDA GPU but only one GPU is working
What is the expected behaviour?
all specified 4 GPU must work
A minimal reproducible example
embedding_model = LateInteractionTextEmbedding("jinaai/jina-colbert-v2",cuda=True,device_ids=[0,1,2,3])
descriptions_embeddings = list(embedding_model.embed(documents,parallel=4))
What Python version are you on? e.g. python --version
python3.11
FastEmbed version
v0.4.2
What os are you seeing the problem on?
No response
Relevant stack traces and/or logs
No response
The text was updated successfully, but these errors were encountered: