You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a question but first thank you for sharing this amazing project its great for starters.
I have been using the chat bot but when I run the code with CUDA on the 7B model it is super slow, i mean really really bad.
but when I do CPU I does work way better almost real time. The automapping feature is also very very slow.
PC:
Intel 7i 6GEN 8 cores
Memory 32GB
VC: Nvidia2070 8GB
Could someone explain me why the GPU performs worst than the CPU and memory?
Is it better to just get more memory to be able to run the 14B or should i get one of those nvidia cards like Tesla with 24GB?
my testing tell me its better to get memory instead of a very expensive GPU.
Thanks
The text was updated successfully, but these errors were encountered:
this does't make sense to me as i have seen the opposite, cpu is super slow and only gpu can truly speed things up. llamaccp however is the faster cpu version but my query times talking to my own embedded data is still 1-3 mins per query
I have a question on memory (and I also observe very slow response using a GPU). On my Dell Precision 7780 workstation laptop with the 7B model I see the Nvidia GPU mem goes up to about 3GB out of the available 6GB. I assume this is due to 4-bit quantization? BUT, the CPU memory shoots up to almost 95% of my 32GB so it appears it is both shipping to GPU and CPU at the same time? Looking at the code I see in model.py these lines:
for layer in tqdm(self.layers, desc="flayers", leave=True):
if use_gpu:
move_parameters_to_gpu(layer)
h = layer(h, start_pos, freqs_cis, mask)
if use_gpu:
**move_parameters_to_cpu(layer)**
Why whould it move parameters to CPU in that if use_gpu statement?
I have a question but first thank you for sharing this amazing project its great for starters.
I have been using the chat bot but when I run the code with CUDA on the 7B model it is super slow, i mean really really bad.
but when I do CPU I does work way better almost real time. The automapping feature is also very very slow.
PC:
Intel 7i 6GEN 8 cores
Memory 32GB
VC: Nvidia2070 8GB
Could someone explain me why the GPU performs worst than the CPU and memory?
Is it better to just get more memory to be able to run the 14B or should i get one of those nvidia cards like Tesla with 24GB?
my testing tell me its better to get memory instead of a very expensive GPU.
Thanks
The text was updated successfully, but these errors were encountered: