Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

epic: Jan integrates Cortex.cpp #3825

Closed
1 of 2 tasks
Tracked by #3824
0xSage opened this issue Oct 17, 2024 · 1 comment
Closed
1 of 2 tasks
Tracked by #3824

epic: Jan integrates Cortex.cpp #3825

0xSage opened this issue Oct 17, 2024 · 1 comment
Assignees
Labels
category: providers Local & remote inference providers P1: important Important feature / fix type: epic A major feature or initiative
Milestone

Comments

@0xSage
Copy link
Contributor

0xSage commented Oct 17, 2024

Goal

Tasklist

Previous Issues

@0xSage 0xSage added type: epic A major feature or initiative category: local providers labels Oct 17, 2024
@0xSage 0xSage added category: providers Local & remote inference providers and removed category: local providers labels Oct 17, 2024
@dan-homebrew dan-homebrew changed the title epic: Jan Integrates Cortex.cpp as Provider epic: Provider Extension - Cortex.cpp Oct 21, 2024
@louis-jan
Copy link
Contributor

louis-jan commented Oct 24, 2024

Implementation Specs

Migration Path:

  1. App 0.5.8 opens
  2. Return model list from cache (given users are on 0.5.7) -> function normally.
  3. Scan JSON models (legacy logics - fresh install or older versions) -> function normally.
  4. In the background, the app attempts to import models from cortex.cpp and merge them with legacy downloaded models (failed to import models).
  5. The app combines models returned by cortex.cpp and legacy JSON models. Cortex.cpp models are prioritized in case of the same ID (Models are imported successfully.)

Changes

Naming convention

  • inference-nitro-extension is renamed into inference-cortex-extension.
  • cortex.cpp binaries have the same name as engine releases.
  • Pre-package everything, include cuda dependencies (dll, so) so users don't have to install separately.
  • Support noavx-cuda binaries as a fallback

Simplifed

  • Deprecated ModelFile. It's no longer relevant. Now, providers define models, so it should manage how to run itself.
  • Remove install cuda toolkit UX, should be ready after installed.

Downloader

App proxies to cortex.cpp or app's downloader, depending on the cortex.cpp model support capability.

Model Hub

  • App allows extensions to register models available for download in RAM. After downloading them, the models will have their yaml or json persisted along with the model files.
  • App priorities model hub decoration (previous json metadata) over cortex.cpp metadata (such as name, size, tags)

Observability

  • cortex-extension should watch cortex.cpp server upon launch. It ensures that the cortex process runs with the application.
  • All requests will be queued and run when the server to come online, ensuring the UX remains the same. So there would be no asynchronous requests and server run introduced. E.g. Model import or start should not fail due to server not being online in time.
  • So there would be no attempt to kill the cortex process on model start every time. It is just a stop and start model, so it will not block other API requests.

Goals

  • Updated from the older version to this version, models will be imported and run normally. Models are not imported will still able to run since we will attempt to do preflight before running.
  • Users can download models or app proxies to cortex.cpp, or use the app downloader, depending on the cortex.cpp model support capability.

Subtasks

  • Pull latest cortex.cpp and engines to package
  • Keep binaries with the same name as their release name
  • Support noavx-cuda binaries as a fallback
  • Rename nitro-extension to cortex-extension
  • Deprecate ModelFile
  • Pre-package Cuda dependencies
  • Register models should be persisted in memory.
  • App manages to download models using cortex.cpp downloader and it's legacy downloader (tensorrt-llm and clip models)
  • App gets persisted models from cortex.cpp and scans JSON models itself (legacy)
  • App prioritize model decoration metadata from Hub over cortex.cpp
  • cortex-extension watches cortex.cpp server on launch
  • Model-extension queues cortex.cpp API requests with health checks so requests do not fail due to server uptime
  • cortex.cpp supports legacy model load parameters (NGL, context-length, cache enable, clip mmproj)
  • App shares CUDA dependencies with extensions (asar.unpackaged) to improve load time and reduce app size when multiple extensions share the same artifacts (tensorrt-llm and llama-cpp)

@janhq/jan @janhq/cortex

@louis-jan louis-jan added this to the v0.5.8 milestone Oct 25, 2024
@dan-homebrew dan-homebrew changed the title epic: Provider Extension - Cortex.cpp epic: Jan integrates Cortex.cpp Oct 29, 2024
@imtuyethan imtuyethan added the P1: important Important feature / fix label Oct 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: providers Local & remote inference providers P1: important Important feature / fix type: epic A major feature or initiative
Projects
Status: Review + QA
Development

No branches or pull requests

3 participants