Skip to content

Commit

Permalink
remove couple of gerunds + adjust code sections (copy tags) (#330)
Browse files Browse the repository at this point in the history
* added markdown artifacts for the telegram bot livelab

* implemented the adjustements requested by the livelab team

* additional improvements suggested by Anoosha

* adjustement requested by Rahul Tasker

* minor improvement on the bot config step

* QUARTERLY QA ID 11418

* QUARTERLY QA ID 11418

* added springai artifacts for the oracle vector search livelab

* added adjustments after QA checks

* markdown adjustements

* markdown adjustements + database step

* markdown final adjustments

* final batch of improvements and adjustments

* diagram adjustment

* remove couple of gerunds + adjust code sections (copy tags)
  • Loading branch information
juarezjuniorgithub authored Aug 23, 2024
1 parent f5fbe62 commit f006567
Showing 1 changed file with 26 additions and 2 deletions.
28 changes: 26 additions & 2 deletions springai-vector/model/model.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,14 +16,14 @@ Mac:

In this lab, you will:

- Look at deploying Cohere AI Command-R models with Ollama and Oracle Cloud Infrastructure (OCI).
- Deploy Cohere AI Command-R models with Ollama and Oracle Cloud Infrastructure (OCI).
- Look at the basic test of your model's endpoint for Command-R.

### Prerequisites

* This lab requires the completion of the **Setup Dev Environment** tutorial.

## Task 1. Using Cohere AI's Command-R model to support chat and embeddings with private LLMs
## Task 1. Use Cohere AI's Command-R model to support chat and embeddings with private LLMs

Cohere Command-R is a family of LLMs optimized for conversational interaction and long context tasks. Command R delivers high precision on retrieval augmented generation (RAG) with low latency and high throughput. You can get more details about the Command-R models at the [Command-R product page](https://cohere.com/command), and the full technical details are available at the [Model Details](https://docs.cohere.com/docs/command-r) section of its technical documentation.

Expand Down Expand Up @@ -62,82 +62,102 @@ Cohere Command-R is a family of LLMs optimized for conversational interaction an
7. At the end of creation process, obtain the **Public IPv4 address**, and with your private key (the one you generated or uploaded during creation), connect to:

```
<copy>
ssh -i ./<your_private>.key opc@[GPU_SERVER_IP]
</copy>
```

8. Install and configure docker to use GPUs:

```
<copy>
sudo /usr/libexec/oci-growfs
curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
sudo dnf install -y dnf-utils zip unzip
sudo dnf config-manager --add-repo=https://download.docker.com/linux/centos/docker-ce.repo
sudo dnf remove -y runc
sudo dnf install -y docker-ce --nobest
sudo useradd docker_user
</copy>
```

9. We need to make sure that your Operating System user has permissions to run Docker containers. To do this, we can run the following command:

```
<copy>
sudo visudo
</copy>
```

And add this line at the end:

```
<copy>
docker_user ALL=(ALL) NOPASSWD: /usr/bin/docker
</copy>
```

10. For convenience, we need to switch to our new user. For this, run:

```
<copy>
sudo su - docker_user
</copy>
```

11. Finally, let's add an alias to execute Docker with admin privileges every time we type `docker` in our shell. For this, we need to modify a file, depending on your OS (in `.bash_profile` (MacOS) / `.bashrc` (Linux)). Insert, at the end of the file, this command:

```
<copy>
alias docker="sudo /usr/bin/docker"
exit
</copy>
```

12. We finalize our installation by executing:

```
<copy>
sudo yum install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
nvidia-ctk runtime configure --runtime=docker --config=$HOME/.config/docker/daemon.json
</copy>
```

13. If you're on Ubuntu instead, run:

```
<copy>
sudo apt-get install nvidia-container-toolkit=1.14.3-1 \
nvidia-container-toolkit-base=1.14.3-1 \
libnvidia-container-tools=1.14.3-1 \
libnvidia-container1=1.14.3-1
sudo apt-get install -y nvidia-docker2
</copy>
```

13. Let's reboot and re-connect to the VM, and run again:

```
<copy>
sudo reboot now
# after restart, run:
sudo su - docker_user
</copy>
```

14. Run `docker` to check if everything it's ok.

15. Let's run a Docker container with the `ollama/llama2` model for embeddings/completion:

```
<copy>
docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama serve
docker exec -it ollama ollama pull command-r
docker exec -it ollama ollama pull llama3
docker logs -f --tail 10 ollama
</copy>
```

Both the model, for embeddings/completion will run under the same server, and they will be addressed providing in the REST request for the specific model required.
Expand Down Expand Up @@ -173,19 +193,23 @@ Your configured ingress rule:
6. Configure the environment variables below directly, or update the `env.sh` file and run `source ./env.sh`:

```
<copy>
export OLLAMA_URL=http://[GPU_SERVER_IP]:11434
export OLLAMA_EMBEDDINGS=command-r
export OLLAMA_MODEL=command-r
</copy>
```


7. Test with a shell running:

```
<copy>
curl ${OLLAMA_URL}/api/generate -d '{
"model": "command-r",
"prompt":"Who is Ayrton Senna?"
}'
</copy>
```

You'll receive the response in continuous sequential responses, facilitating the delivery of the content chunk by chunk, instead of forcing API users to wait for the whole response to be generated before it's displayed to users.
Expand Down

0 comments on commit f006567

Please sign in to comment.