Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: update use llama-server instead #34

Merged
merged 2 commits into from
May 6, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 18 additions & 8 deletions docs/guides/langchain.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,23 +42,33 @@ spec:
template:
spec:
containers:
- name: "ai-model-7b"
image: modelzai/llm-llama-7b:latest # GPU requirements: A100(40GB)
# alternatively use, modelzai/llm-bloomz-560m:latest (even works on CPU)
- name: "ai-model"
image: swarnimarun/llama-server:latest-cuda
# GPU requirements: T4(16GB)
# For CPU : "swarnimarun/llama-server:latest" - 16GB
args:
- "-m"
- "lmstudio-community/Meta-Llama-3-8B-Instruct-GGUF"
- "-g"
- "33"
- "-q"
- "q8"
accelerator:
interface: "CUDA"
minVersion:
major: 7
resources:
limits:
cpu: "1"
memory: "16Gi" # a decent amount of RAM is required for loading the model as well
memory: "2Gi" # loading of model maybe slow or buggy for large models with low RAM
# for faster initial loading of large models increase to at least 8GB of RAM
# if you want to use CPU inference, use at least 16GB of RAM for 7B models
```

- Port forward the deployment service. If you don't have a proper ingress setup for your cluster.

```bash
kubectl port-forward service/aideployment 8000:8000
kubectl port-forward service/aideployment 80:8000
```

- Now locally, install the required libraries.
Expand All @@ -73,11 +83,11 @@ pip install langchain openai
import os
import openai

# note: we port-forwarded the service to 8000
openai.api_base="http://localhost:8000"
# note: we port-forwarded the service to 80 aka http
openai.api_base="http://localhost"
# if you have ingress setup then use your domain name
# you can also modify the port to use http(s) port itself
# openai.api_base="https://<DOMAIN-NAME>.tld:8000"
# openai.api_base="https://<DOMAIN-NAME>.tld"
openai.api_key = "any"
```

Expand Down
Loading