Skip to content

Commit b5c1105

Browse files
committed
docs: docker installation instruction
1 parent 5fde673 commit b5c1105

File tree

1 file changed

+138
-3
lines changed

1 file changed

+138
-3
lines changed

docs/docs/installation/docker.mdx

Lines changed: 138 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,143 @@
1+
12
---
23
title: Docker
3-
description: Install Cortex through Docker.
4+
description: Install Cortex using Docker.
45
---
56

67
:::warning
7-
🚧 Cortex.cpp is currently under development. Our documentation outlines the intended behavior of Cortex, which may not yet be fully implemented in the codebase.
8-
:::
8+
🚧 **Cortex.cpp is currently in development.** The documentation describes the intended functionality, which may not yet be fully implemented.
9+
:::
10+
11+
# Setting Up Cortex with Docker
12+
13+
This guide walks you through the setup and running of Cortex using Docker.
14+
15+
## Prerequisites
16+
17+
- Docker or Docker Desktop
18+
- `nvidia-container-toolkit` (for GPU support)
19+
20+
## Setup Instructions
21+
22+
1. **Clone the Cortex Repository**
23+
```bash
24+
git clone https://github.com/janhq/cortex.cpp.git
25+
cd cortex.cpp
26+
git submodule update --init
27+
```
28+
29+
2. **Build the Docker Image**
30+
- To use the latest versions of `cortex.cpp` and `cortex.llamacpp`:
31+
```bash
32+
docker build -t cortex --build-arg CORTEX_CPP_VERSION=$(git rev-parse HEAD) -f docker/Dockerfile .
33+
```
34+
- To specify versions:
35+
```bash
36+
docker build --build-arg CORTEX_LLAMACPP_VERSION=0.1.34 --build-arg CORTEX_CPP_VERSION=$(git rev-parse HEAD) -t cortex -f docker/Dockerfile .
37+
```
38+
39+
3. **Run the Docker Container**
40+
- Create a Docker volume to store models and data:
41+
```bash
42+
docker volume create cortex_data
43+
```
44+
- Run in **CPU mode**:
45+
```bash
46+
docker run -it -d --name cortex -v cortex_data:/root/cortexcpp -p 39281:39281 cortex
47+
```
48+
- Run in **GPU mode** (requires `nvidia-docker`):
49+
```bash
50+
docker run --gpus all -it -d --name cortex -v cortex_data:/root/cortexcpp -p 39281:39281 cortex
51+
```
52+
53+
4. **Check Logs (Optional)**
54+
```bash
55+
docker logs cortex
56+
```
57+
58+
5. **Access the Cortex Documentation API**
59+
- Open [http://localhost:39281](http://localhost:39281) in your browser.
60+
61+
6. **Access the Container and Try Cortex CLI**
62+
```bash
63+
docker exec -it cortex bash
64+
cortex --help
65+
```
66+
67+
## Usage
68+
69+
With Docker running, you can use the following commands to interact with Cortex. Ensure the container is running and `curl` is installed on your machine.
70+
71+
### 1. List Available Engines
72+
73+
```bash
74+
curl --request GET --url http://localhost:39281/v1/engines --header "Content-Type: application/json"
75+
```
76+
77+
- **Example Response**
78+
```json
79+
{
80+
"data": [
81+
{
82+
"description": "This extension enables chat completion API calls using the Onnx engine",
83+
"format": "ONNX",
84+
"name": "onnxruntime",
85+
"status": "Incompatible"
86+
},
87+
{
88+
"description": "This extension enables chat completion API calls using the LlamaCPP engine",
89+
"format": "GGUF",
90+
"name": "llama-cpp",
91+
"status": "Ready",
92+
"variant": "linux-amd64-avx2",
93+
"version": "0.1.37"
94+
}
95+
],
96+
"object": "list",
97+
"result": "OK"
98+
}
99+
```
100+
101+
### 2. Pull Models from Hugging Face
102+
103+
- Open a terminal and run `websocat ws://localhost:39281/events` to capture events.
104+
- In another terminal, pull models using the commands below.
105+
106+
```bash
107+
# Pull model from Cortex's Hugging Face hub
108+
curl --request POST --url http://localhost:39281/v1/models/pull --header 'Content-Type: application/json' --data '{"model": "tinyllama:gguf"}'
109+
```
110+
111+
```bash
112+
# Pull model directly from a URL
113+
curl --request POST --url http://localhost:39281/v1/models/pull --header 'Content-Type: application/json' --data '{"model": "https://huggingface.co/afrideva/zephyr-smol_llama-100m-sft-full-GGUF/blob/main/zephyr-smol_llama-100m-sft-full.q2_k.gguf"}'
114+
```
115+
116+
### 3. Start a Model and Send an Inference Request
117+
118+
- **Start the model:**
119+
```bash
120+
curl --request POST --url http://localhost:39281/v1/models/start --header 'Content-Type: application/json' --data '{"model": "tinyllama:gguf"}'
121+
```
122+
123+
- **Send an inference request:**
124+
```bash
125+
curl --request POST --url http://localhost:39281/v1/chat/completions --header 'Content-Type: application/json' --data '{
126+
"frequency_penalty": 0.2,
127+
"max_tokens": 4096,
128+
"messages": [{"content": "Tell me a joke", "role": "user"}],
129+
"model": "tinyllama:gguf",
130+
"presence_penalty": 0.6,
131+
"stop": ["End"],
132+
"stream": true,
133+
"temperature": 0.8,
134+
"top_p": 0.95
135+
}'
136+
```
137+
138+
### 4. Stop a Model
139+
140+
- To stop a running model, use:
141+
```bash
142+
curl --request POST --url http://localhost:39281/v1/models/stop --header 'Content-Type: application/json' --data '{"model": "tinyllama:gguf"}'
143+
```

0 commit comments

Comments
 (0)