Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU vs CPU which one is best? #33

Open
masterchop opened this issue Jul 20, 2023 · 2 comments
Open

GPU vs CPU which one is best? #33

masterchop opened this issue Jul 20, 2023 · 2 comments

Comments

@masterchop
Copy link

masterchop commented Jul 20, 2023

I have a question but first thank you for sharing this amazing project its great for starters.
I have been using the chat bot but when I run the code with CUDA on the 7B model it is super slow, i mean really really bad.
but when I do CPU I does work way better almost real time. The automapping feature is also very very slow.
PC:
Intel 7i 6GEN 8 cores
Memory 32GB
VC: Nvidia2070 8GB

Could someone explain me why the GPU performs worst than the CPU and memory?
Is it better to just get more memory to be able to run the 14B or should i get one of those nvidia cards like Tesla with 24GB?
my testing tell me its better to get memory instead of a very expensive GPU.

Thanks

@nafets33
Copy link

this does't make sense to me as i have seen the opposite, cpu is super slow and only gpu can truly speed things up. llamaccp however is the faster cpu version but my query times talking to my own embedded data is still 1-3 mins per query

@buckleybrian
Copy link

buckleybrian commented Nov 20, 2023

I have a question on memory (and I also observe very slow response using a GPU). On my Dell Precision 7780 workstation laptop with the 7B model I see the Nvidia GPU mem goes up to about 3GB out of the available 6GB. I assume this is due to 4-bit quantization? BUT, the CPU memory shoots up to almost 95% of my 32GB so it appears it is both shipping to GPU and CPU at the same time? Looking at the code I see in model.py these lines:

    for layer in tqdm(self.layers, desc="flayers", leave=True):
        if use_gpu:
            move_parameters_to_gpu(layer)
        h = layer(h, start_pos, freqs_cis, mask)
        if use_gpu:
            **move_parameters_to_cpu(layer)**

Why whould it move parameters to CPU in that if use_gpu statement?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants