-
Notifications
You must be signed in to change notification settings - Fork 116
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
docs: docker installation instruction
- Loading branch information
1 parent
5fde673
commit b5c1105
Showing
1 changed file
with
138 additions
and
3 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,8 +1,143 @@ | ||
|
||
--- | ||
title: Docker | ||
description: Install Cortex through Docker. | ||
description: Install Cortex using Docker. | ||
--- | ||
|
||
:::warning | ||
🚧 Cortex.cpp is currently under development. Our documentation outlines the intended behavior of Cortex, which may not yet be fully implemented in the codebase. | ||
::: | ||
🚧 **Cortex.cpp is currently in development.** The documentation describes the intended functionality, which may not yet be fully implemented. | ||
::: | ||
|
||
# Setting Up Cortex with Docker | ||
|
||
This guide walks you through the setup and running of Cortex using Docker. | ||
|
||
## Prerequisites | ||
|
||
- Docker or Docker Desktop | ||
- `nvidia-container-toolkit` (for GPU support) | ||
|
||
## Setup Instructions | ||
|
||
1. **Clone the Cortex Repository** | ||
```bash | ||
git clone https://github.com/janhq/cortex.cpp.git | ||
cd cortex.cpp | ||
git submodule update --init | ||
``` | ||
|
||
2. **Build the Docker Image** | ||
- To use the latest versions of `cortex.cpp` and `cortex.llamacpp`: | ||
```bash | ||
docker build -t cortex --build-arg CORTEX_CPP_VERSION=$(git rev-parse HEAD) -f docker/Dockerfile . | ||
``` | ||
- To specify versions: | ||
```bash | ||
docker build --build-arg CORTEX_LLAMACPP_VERSION=0.1.34 --build-arg CORTEX_CPP_VERSION=$(git rev-parse HEAD) -t cortex -f docker/Dockerfile . | ||
``` | ||
|
||
3. **Run the Docker Container** | ||
- Create a Docker volume to store models and data: | ||
```bash | ||
docker volume create cortex_data | ||
``` | ||
- Run in **CPU mode**: | ||
```bash | ||
docker run -it -d --name cortex -v cortex_data:/root/cortexcpp -p 39281:39281 cortex | ||
``` | ||
- Run in **GPU mode** (requires `nvidia-docker`): | ||
```bash | ||
docker run --gpus all -it -d --name cortex -v cortex_data:/root/cortexcpp -p 39281:39281 cortex | ||
``` | ||
|
||
4. **Check Logs (Optional)** | ||
```bash | ||
docker logs cortex | ||
``` | ||
|
||
5. **Access the Cortex Documentation API** | ||
- Open [http://localhost:39281](http://localhost:39281) in your browser. | ||
|
||
6. **Access the Container and Try Cortex CLI** | ||
```bash | ||
docker exec -it cortex bash | ||
cortex --help | ||
``` | ||
|
||
## Usage | ||
|
||
With Docker running, you can use the following commands to interact with Cortex. Ensure the container is running and `curl` is installed on your machine. | ||
|
||
### 1. List Available Engines | ||
|
||
```bash | ||
curl --request GET --url http://localhost:39281/v1/engines --header "Content-Type: application/json" | ||
``` | ||
|
||
- **Example Response** | ||
```json | ||
{ | ||
"data": [ | ||
{ | ||
"description": "This extension enables chat completion API calls using the Onnx engine", | ||
"format": "ONNX", | ||
"name": "onnxruntime", | ||
"status": "Incompatible" | ||
}, | ||
{ | ||
"description": "This extension enables chat completion API calls using the LlamaCPP engine", | ||
"format": "GGUF", | ||
"name": "llama-cpp", | ||
"status": "Ready", | ||
"variant": "linux-amd64-avx2", | ||
"version": "0.1.37" | ||
} | ||
], | ||
"object": "list", | ||
"result": "OK" | ||
} | ||
``` | ||
|
||
### 2. Pull Models from Hugging Face | ||
|
||
- Open a terminal and run `websocat ws://localhost:39281/events` to capture events. | ||
- In another terminal, pull models using the commands below. | ||
|
||
```bash | ||
# Pull model from Cortex's Hugging Face hub | ||
curl --request POST --url http://localhost:39281/v1/models/pull --header 'Content-Type: application/json' --data '{"model": "tinyllama:gguf"}' | ||
``` | ||
|
||
```bash | ||
# Pull model directly from a URL | ||
curl --request POST --url http://localhost:39281/v1/models/pull --header 'Content-Type: application/json' --data '{"model": "https://huggingface.co/afrideva/zephyr-smol_llama-100m-sft-full-GGUF/blob/main/zephyr-smol_llama-100m-sft-full.q2_k.gguf"}' | ||
``` | ||
|
||
### 3. Start a Model and Send an Inference Request | ||
|
||
- **Start the model:** | ||
```bash | ||
curl --request POST --url http://localhost:39281/v1/models/start --header 'Content-Type: application/json' --data '{"model": "tinyllama:gguf"}' | ||
``` | ||
|
||
- **Send an inference request:** | ||
```bash | ||
curl --request POST --url http://localhost:39281/v1/chat/completions --header 'Content-Type: application/json' --data '{ | ||
"frequency_penalty": 0.2, | ||
"max_tokens": 4096, | ||
"messages": [{"content": "Tell me a joke", "role": "user"}], | ||
"model": "tinyllama:gguf", | ||
"presence_penalty": 0.6, | ||
"stop": ["End"], | ||
"stream": true, | ||
"temperature": 0.8, | ||
"top_p": 0.95 | ||
}' | ||
``` | ||
|
||
### 4. Stop a Model | ||
|
||
- To stop a running model, use: | ||
```bash | ||
curl --request POST --url http://localhost:39281/v1/models/stop --header 'Content-Type: application/json' --data '{"model": "tinyllama:gguf"}' | ||
``` |