Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: sync llama.cpp #79

Merged
merged 7 commits into from
Nov 2, 2024
Merged

feat: sync llama.cpp #79

merged 7 commits into from
Nov 2, 2024

Conversation

a-ghorbani
Copy link
Contributor

@a-ghorbani a-ghorbani commented Oct 30, 2024

Sync llama.cpp with the b3972 release.

This will close #77.

@a-ghorbani
Copy link
Contributor Author

I’m almost done, just running the final tests.
Mostly based on @Vali-98's work: https://github.com/Vali-98/cui-llama.rn

@Vali-98 , if you have a moment, a quick review would be much appreciated to catch anything I might have missed!

@Vali-98
Copy link
Contributor

Vali-98 commented Oct 30, 2024

It looks okay from a glance, can't confirm for the IOS code though.

Just to confirm, I applied changes for cui-llama.rn from my fork of llama.cpp: https://github.com/Vali-98/llama.cpp/tree/cui-llama.rn. @a-ghorbani , did you swap to this fork or translated my changes to the relevant patch files?

The current method of patching changes to llama.rn is the main reason why I wanted my own fork instead, as patching the files tends to fail spectacularly when syncing llama.cpp. After 2-3 sync's, I decided that it was a major maintainability issue. I assume this is the reason why the CI is failing too.

Ideally for release builds, we would include prebuilt binaries instead of needing to redundantly build rn-llama.cpp.

@a-ghorbani
Copy link
Contributor Author

It looks okay from a glance, can't confirm for the IOS code though.

Cool! For the iOS part, I took some ideas from Android-related changes.

Just to confirm, I applied changes for cui-llama.rn from my fork of llama.cpp: https://github.com/Vali-98/llama.cpp/tree/cui-llama.rn. @a-ghorbani , did you swap to this fork or translated my changes to the relevant patch files?

I translated the changes into relevant patch files to maintain consistency with the approach in this repo.

The current method of patching changes to llama.rn is the main reason why I wanted my own fork instead, as patching the files tends to fail spectacularly when syncing llama.cpp. After 2-3 sync's, I decided that it was a major maintainability issue. I assume this is the reason why the CI is failing too.

Yep, maintaining these patches is a challenge. usually, I revert the changes and create the patches from scratch again.
(the ci was actually due to an issue with the submodule update had an extra --remote that led to fetching the latest rather than using the one pinned here; it should be fixed now).

@Vali-98
Copy link
Contributor

Vali-98 commented Oct 31, 2024

You might want to mention that this adds the vocab_only field as it isn't actually a part of gpt_params on llama.cpp.

@a-ghorbani a-ghorbani marked this pull request as ready for review October 31, 2024 08:16
@jhen0409
Copy link
Member

jhen0409 commented Nov 2, 2024

This is awesome, thanks for the contribution!

It looks like n_gpu_layers = 0 for disable GPU is broken (see ggerganov/llama.cpp#10089), I'll checkout for that before merge this.

@jhen0409 jhen0409 merged commit 1ca3044 into mybigday:main Nov 2, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RWKV support (sync llama.cpp)
3 participants