|
| 1 | + |
1 | 2 | ---
|
2 | 3 | title: Docker
|
3 |
| -description: Install Cortex through Docker. |
| 4 | +description: Install Cortex using Docker. |
4 | 5 | ---
|
5 | 6 |
|
6 | 7 | :::warning
|
7 |
| -🚧 Cortex.cpp is currently under development. Our documentation outlines the intended behavior of Cortex, which may not yet be fully implemented in the codebase. |
8 |
| -::: |
| 8 | +🚧 **Cortex.cpp is currently in development.** The documentation describes the intended functionality, which may not yet be fully implemented. |
| 9 | +::: |
| 10 | + |
| 11 | +# Setting Up Cortex with Docker |
| 12 | + |
| 13 | +This guide walks you through the setup and running of Cortex using Docker. |
| 14 | + |
| 15 | +## Prerequisites |
| 16 | + |
| 17 | +- Docker or Docker Desktop |
| 18 | +- `nvidia-container-toolkit` (for GPU support) |
| 19 | + |
| 20 | +## Setup Instructions |
| 21 | + |
| 22 | +1. **Clone the Cortex Repository** |
| 23 | + ```bash |
| 24 | + git clone https://github.com/janhq/cortex.cpp.git |
| 25 | + cd cortex.cpp |
| 26 | + git submodule update --init |
| 27 | + ``` |
| 28 | + |
| 29 | +2. **Build the Docker Image** |
| 30 | + - To use the latest versions of `cortex.cpp` and `cortex.llamacpp`: |
| 31 | + ```bash |
| 32 | + docker build -t cortex --build-arg CORTEX_CPP_VERSION=$(git rev-parse HEAD) -f docker/Dockerfile . |
| 33 | + ``` |
| 34 | + - To specify versions: |
| 35 | + ```bash |
| 36 | + docker build --build-arg CORTEX_LLAMACPP_VERSION=0.1.34 --build-arg CORTEX_CPP_VERSION=$(git rev-parse HEAD) -t cortex -f docker/Dockerfile . |
| 37 | + ``` |
| 38 | + |
| 39 | +3. **Run the Docker Container** |
| 40 | + - Create a Docker volume to store models and data: |
| 41 | + ```bash |
| 42 | + docker volume create cortex_data |
| 43 | + ``` |
| 44 | + - Run in **CPU mode**: |
| 45 | + ```bash |
| 46 | + docker run -it -d --name cortex -v cortex_data:/root/cortexcpp -p 39281:39281 cortex |
| 47 | + ``` |
| 48 | + - Run in **GPU mode** (requires `nvidia-docker`): |
| 49 | + ```bash |
| 50 | + docker run --gpus all -it -d --name cortex -v cortex_data:/root/cortexcpp -p 39281:39281 cortex |
| 51 | + ``` |
| 52 | + |
| 53 | +4. **Check Logs (Optional)** |
| 54 | + ```bash |
| 55 | + docker logs cortex |
| 56 | + ``` |
| 57 | + |
| 58 | +5. **Access the Cortex Documentation API** |
| 59 | + - Open [http://localhost:39281](http://localhost:39281) in your browser. |
| 60 | + |
| 61 | +6. **Access the Container and Try Cortex CLI** |
| 62 | + ```bash |
| 63 | + docker exec -it cortex bash |
| 64 | + cortex --help |
| 65 | + ``` |
| 66 | + |
| 67 | +## Usage |
| 68 | + |
| 69 | +With Docker running, you can use the following commands to interact with Cortex. Ensure the container is running and `curl` is installed on your machine. |
| 70 | + |
| 71 | +### 1. List Available Engines |
| 72 | + |
| 73 | +```bash |
| 74 | +curl --request GET --url http://localhost:39281/v1/engines --header "Content-Type: application/json" |
| 75 | +``` |
| 76 | + |
| 77 | +- **Example Response** |
| 78 | + ```json |
| 79 | + { |
| 80 | + "data": [ |
| 81 | + { |
| 82 | + "description": "This extension enables chat completion API calls using the Onnx engine", |
| 83 | + "format": "ONNX", |
| 84 | + "name": "onnxruntime", |
| 85 | + "status": "Incompatible" |
| 86 | + }, |
| 87 | + { |
| 88 | + "description": "This extension enables chat completion API calls using the LlamaCPP engine", |
| 89 | + "format": "GGUF", |
| 90 | + "name": "llama-cpp", |
| 91 | + "status": "Ready", |
| 92 | + "variant": "linux-amd64-avx2", |
| 93 | + "version": "0.1.37" |
| 94 | + } |
| 95 | + ], |
| 96 | + "object": "list", |
| 97 | + "result": "OK" |
| 98 | + } |
| 99 | + ``` |
| 100 | + |
| 101 | +### 2. Pull Models from Hugging Face |
| 102 | + |
| 103 | +- Open a terminal and run `websocat ws://localhost:39281/events` to capture events. |
| 104 | +- In another terminal, pull models using the commands below. |
| 105 | + |
| 106 | + ```bash |
| 107 | + # Pull model from Cortex's Hugging Face hub |
| 108 | + curl --request POST --url http://localhost:39281/v1/models/pull --header 'Content-Type: application/json' --data '{"model": "tinyllama:gguf"}' |
| 109 | + ``` |
| 110 | + |
| 111 | + ```bash |
| 112 | + # Pull model directly from a URL |
| 113 | + curl --request POST --url http://localhost:39281/v1/models/pull --header 'Content-Type: application/json' --data '{"model": "https://huggingface.co/afrideva/zephyr-smol_llama-100m-sft-full-GGUF/blob/main/zephyr-smol_llama-100m-sft-full.q2_k.gguf"}' |
| 114 | + ``` |
| 115 | + |
| 116 | +### 3. Start a Model and Send an Inference Request |
| 117 | + |
| 118 | +- **Start the model:** |
| 119 | + ```bash |
| 120 | + curl --request POST --url http://localhost:39281/v1/models/start --header 'Content-Type: application/json' --data '{"model": "tinyllama:gguf"}' |
| 121 | + ``` |
| 122 | + |
| 123 | +- **Send an inference request:** |
| 124 | + ```bash |
| 125 | + curl --request POST --url http://localhost:39281/v1/chat/completions --header 'Content-Type: application/json' --data '{ |
| 126 | + "frequency_penalty": 0.2, |
| 127 | + "max_tokens": 4096, |
| 128 | + "messages": [{"content": "Tell me a joke", "role": "user"}], |
| 129 | + "model": "tinyllama:gguf", |
| 130 | + "presence_penalty": 0.6, |
| 131 | + "stop": ["End"], |
| 132 | + "stream": true, |
| 133 | + "temperature": 0.8, |
| 134 | + "top_p": 0.95 |
| 135 | + }' |
| 136 | + ``` |
| 137 | + |
| 138 | +### 4. Stop a Model |
| 139 | + |
| 140 | +- To stop a running model, use: |
| 141 | + ```bash |
| 142 | + curl --request POST --url http://localhost:39281/v1/models/stop --header 'Content-Type: application/json' --data '{"model": "tinyllama:gguf"}' |
| 143 | + ``` |
0 commit comments