Skip to content

Commit

Permalink
Merge branch 'dev' of github.com:janhq/cortex.cpp into dev
Browse files Browse the repository at this point in the history
  • Loading branch information
hientominh committed Nov 4, 2024
2 parents f6978cd + 76d653f commit 5fde673
Show file tree
Hide file tree
Showing 43 changed files with 1,811 additions and 562 deletions.
3 changes: 3 additions & 0 deletions docs/docs/architecture/cortex-db.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
---
title: cortex.db
---
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -132,7 +132,7 @@ The main directory that stores all Cortex-related files, located in the user's h
#### `models/`
Contains the AI models used by Cortex for processing and generating responses.
:::info
For more information regarding the `model.list` and `model.yaml`, please see [here](/docs/model-yaml).
For more information regarding the `model.list` and `model.yaml`, please see [here](/docs/capabilities/models/model-yaml).
:::
#### `logs/`
Stores log files that are essential for troubleshooting and monitoring the performance of the Cortex.cpp API server and CLI.
Expand Down
3 changes: 3 additions & 0 deletions docs/docs/assistants/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
---
title: Assistants
---
3 changes: 3 additions & 0 deletions docs/docs/assistants/tools/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
---
title: Tools
---
Original file line number Diff line number Diff line change
@@ -1,16 +1,11 @@
---
title: API
title: API Server
description: Cortex Server Overview.
slug: "server"
---

import Tabs from "@theme/Tabs";
import TabItem from "@theme/TabItem";

:::warning
🚧 Cortex.cpp is currently under development. Our documentation outlines the intended behavior of Cortex, which may not yet be fully implemented in the codebase.
:::

Cortex has an [API server](https://cortex.so/api-reference) that runs at `localhost:39281`.


Expand Down
48 changes: 0 additions & 48 deletions docs/docs/basic-usage/command-line.md

This file was deleted.

Original file line number Diff line number Diff line change
@@ -1,9 +1,18 @@
---
title: cortex.js
description: How to integrate cortex.js with a Typescript application.
slug: "ts-library"
description: How to use the Cortex.js Library
---

[Cortex.js](https://github.com/janhq/cortex.js) is a Typescript client library that can be used to interact with the Cortex API.

This is still a work in progress, and we will let the community know once a stable version is available.

:::warning
🚧 Cortex.js is currently under development, and this page is a stub for future development.
:::


<!--
:::warning
🚧 Cortex.cpp is currently under development. Our documentation outlines the intended behavior of Cortex, which may not yet be fully implemented in the codebase.
:::
Expand Down Expand Up @@ -61,4 +70,4 @@ async function inference() {
}
inference();
```
``` -->
Original file line number Diff line number Diff line change
@@ -1,9 +1,15 @@
---
title: cortex.py
description: How to integrate cortex.py with a Python application.
slug: "py-library"
---


:::warning
🚧 Cortex.py is currently under development, and this page is a stub for future development.
:::


<!--
:::warning
🚧 Cortex.cpp is currently under development. Our documentation outlines the intended behavior of Cortex, which may not yet be fully implemented in the codebase.
:::
Expand Down Expand Up @@ -51,4 +57,4 @@ completion = client.chat.completions.create(
],
)
print(completion.choices[0].message.content)
```
``` -->
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: Overview
description: Overview.
description: Cortex Overview
slug: "basic-usage"
---

Expand Down
54 changes: 0 additions & 54 deletions docs/docs/built-in-models.mdx

This file was deleted.

3 changes: 3 additions & 0 deletions docs/docs/capabilities/audio-generation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
---
unlisted: true
---
7 changes: 7 additions & 0 deletions docs/docs/capabilities/embeddings.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
---
title: Embeddings
---

:::info
🚧 Cortex is currently under development, and this page is a stub for future development.
:::
39 changes: 39 additions & 0 deletions docs/docs/capabilities/hardware/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
---
title: Hardware Awareness
draft: True
---

# Hardware Awareness

Cortex is designed to be hardware aware, meaning it can detect your hardware configuration and automatically set parameters to optimize compatibility and performance, and avoid hardware-related errors.

## Hardware Optimization

Cortex's Hardware awareness allows it to do the following:

- Context Length Optimization: Cortex maximizes the context length allowed by your hardware, ensuring that you can work with larger datasets and more complex models without performance degradation.
- Engine Optimization: we detect your CPU and GPU, and maintain a list of optimized engines for each hardware configuration, e.g. taking advantage of AVX-2 and AVX-512 instructions on CPUs.

## Hardware Awareness

- Preventing hardware-related error
- Error Handling for Insufficient VRAM: When loading a second model, Cortex provides useful error messages if there is insufficient VRAM memory. This proactive approach helps prevent out-of-memory errors and guides users on how to resolve the issue.

### Model Compatibility

- Model Compatibility Detection: Cortex automatically detects your hardware configuration to determine the compatibility of different models. This ensures that the models you use are optimized for your specific hardware setup.
- This is for the Hub, and for existing Models

## Hardware Management

### Activating Specific GPUs

Cortex gives you the ability to activating specific GPUs for inference, giving you fine-grained control over hardware resources. This is especially useful for multi-GPU systems.
- Activate GPUs: Cortex can activate and utilize GPUs to accelerate processing, ensuring that computationally intensive tasks are handled efficiently.
You also have the option to deactivate all GPUs, to run inference on only CPU and RAM.

### Hardware Monitoring

- Monitoring System Usage
- Monitor VRAM Usage: Cortex keeps track of VRAM usage to prevent out-of-memory (OOM) errors. It ensures that VRAM is used efficiently and provides warnings when resources are running low.
- Monitor System Resource Usage: Cortex continuously monitors the usage of system resources, including CPU, RAM, and GPUs. This helps in maintaining optimal performance and identifying potential bottlenecks.
3 changes: 3 additions & 0 deletions docs/docs/capabilities/image-generation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
---
unlisted: true
---
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ Cortex.cpp supports three model formats:
- TensorRT-LLM

:::info
For details on each format, see the [Model Formats](/docs/model-yaml#model-formats) page.
For details on each format, see the [Model Formats](/docs/capabilities/models/model-yaml#model-formats) page.
:::

## Built-in Models
Expand All @@ -38,5 +38,5 @@ You can see our full list of Built-in Models [here](/models).
:::

## Next steps
- Cortex requires a `model.yaml` file to run a model. Find out more [here](/docs/model-yaml).
- Cortex requires a `model.yaml` file to run a model. Find out more [here](/docs/capabilities/models/model-yaml).
- Cortex supports multiple model hubs hosting built-in models. See details [here](/docs/model-sources).
Original file line number Diff line number Diff line change
Expand Up @@ -6,24 +6,14 @@ description: The model.yaml
import Tabs from "@theme/Tabs";
import TabItem from "@theme/TabItem";


:::warning
🚧 Cortex.cpp is currently under development. Our documentation outlines the intended behavior of Cortex, which may not yet be fully implemented in the codebase.
:::

Cortex.cpp uses a `model.yaml` file to specify the configuration for running a model. Models can be downloaded from the Cortex Model Hub or Hugging Face repositories. Once downloaded, the model data is parsed and stored in the `models` folder.

## `model.list`
The `model.list` file acts as a registry for all model files used by Cortex.cpp. It keeps track of every downloaded and imported model by listing their details in a structured format. Each time a model is downloaded or imported, Cortex.cpp will automatically append an entry to `model.list` with the following format:
```
# Downloaded model
<model-id> <author_repo-id> <branch-name> <path-to-model.yaml> <model-alias>
# Imported model
<model-id> local imported <path-to-model-id.yaml> <model-alias>
## Structure of `model.yaml`

```
## `model.yaml` High Level Structure
Here is an example of `model.yaml` format:
```yaml
# BEGIN GENERAL METADATA
Expand Down Expand Up @@ -71,7 +61,7 @@ ngl: 33 # Undefined = loaded from model

The `model.yaml` is composed of three high-level sections:

### Cortex Meta
### Model Metadata
```yaml
model: gemma-2-9b-it-Q8_0
name: Llama 3.1
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,12 @@ description: Model Presets
🚧 Cortex.cpp is currently under development. Our documentation outlines the intended behavior of Cortex, which may not yet be fully implemented in the codebase.
:::

<!--
## Model Presets

Model presets are saved `model.yaml` files that serve as templates for pre-configured model settings. These presets are designed to ensure optimal performance with the specified engine.
These presets are not restricted to specific models. You can apply the presets to any model or any engine runtime.

:::info
Model presets override the values of the `model.yaml`. If presets are available, Cortex uses them. Otherwise, it defaults to `model.yaml` values.
:::
::: -->
3 changes: 3 additions & 0 deletions docs/docs/capabilities/moderation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
---
unlisted: true
---
3 changes: 3 additions & 0 deletions docs/docs/capabilities/reasoning.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
---
unlisted: true
---
3 changes: 3 additions & 0 deletions docs/docs/capabilities/speech-to-text.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
---
unlisted: true
---
7 changes: 7 additions & 0 deletions docs/docs/capabilities/text-generation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
---
title: Text Generation
---

:::info
🚧 Cortex is currently under development, and this page is a stub for future development.
:::
3 changes: 3 additions & 0 deletions docs/docs/capabilities/text-to-speech.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
---
unlisted: true
---
3 changes: 3 additions & 0 deletions docs/docs/capabilities/vision.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
---
unlisted: true
---
3 changes: 1 addition & 2 deletions docs/docs/chat-completions.mdx
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
---
title: Chat Completions
description: Chat Completions Feature.
slug: "text-generation"
description: Chat Completions Feature
---

import Tabs from "@theme/Tabs";
Expand Down
Original file line number Diff line number Diff line change
@@ -1,8 +1,13 @@
---
title: Integrate Remote Engine
description: How to integrate remote engine into Cortex.
title: Building Engine Extensions
description: Cortex supports Engine Extensions to integrate both :ocal inference engines, and Remote APIs.
---

:::info
🚧 Cortex is currently under development, and this page is a stub for future development.
:::

<!--
import Tabs from "@theme/Tabs";
import TabItem from "@theme/TabItem";

Expand Down Expand Up @@ -81,4 +86,4 @@ The `transformResponse` method is used to transform the data received from the e
**Example: Anthropic Engine**

In the Anthropic Engine, the `transformResponse` method handles both stream and non-stream responses. It processes the response data and converts it into a standardized format.

-->
Loading

0 comments on commit 5fde673

Please sign in to comment.