planning: Remote API Extensions for Jan & Cortex #3786

dan-homebrew · 2024-10-13T14:02:33Z

Goal

Note: This Epic has changed multiple times, as our architecture has also changed
A lot of the early comments are referring to a different context
e.g. "Provider Abstraction" in Jan

Cortex is now an API Platform and needs to route route /chat/completion requests to Remote APIs
- This is intended to allow us to support Groq, Martian, OpenRouter etc
Remote API Extensions will need to support
- Getting Remote API's model list
- Enabling certain default models (e.g. we may not want to show every nightly model in Remote API's Model List)
- Remote APIs may have specific model.yaml templates (e.g. context length)
- Routing of /chat/completion
- Extension should cover both UI layer, as well as "Backend" (we may need to modify Cortex to accept a Remote param)
- Handling API Key Management
We may need an incremental path to Remote API Extensions
- Cortex.cpp does not support Extensions for now
- We may need to have Remote API Extensions define a specific payload, that Cortex /chat/completions then routes conditionally

Tasklist

planning: Jan's path to cortex.cpp? #3690
Planning: Migration path for Jan's current Remote Extensions, and API Keys
Planning: Extension APIs that Remote API Extensions will call (e.g. on Jan or Cortex)
Should Remote API Extensions be a separate Github repo?
Support for Prompt Caching:
- idea: Add Claude Prompt Caching Support to Jan #3715

Remote APIs to Support

Popular

Deprioritized

The text was updated successfully, but these errors were encountered:

dan-homebrew · 2024-10-14T06:34:37Z

Goal: Clear Eng Spec for Providers

Scope

"Provider" Scope
- Remote (Groq, NIM, etc)
- Local = Hardware + APIs + Model Management (Ollama, Cortex)
- It is possible that we don't need a differentiation between Remote and Local
- Choosing a better name, vs. "OAIEngine")
Provider Interface + Abstraction
- Providers registers certain things (e.g. UI, Models), can be called by other extensions
- Registers Settings Page
- Registers Models Category + List
Each Provider Extension should be a separate repo?
- I would like this -> add others to help maintain

Jan Providers

Local Provider

Currently, the local extension still has to manage processes itself, which involves utilizing third-party frameworks such as Node.js (child_process) for building functions.

What if we build Jan on mobile we have to cover extensions as well. It would be better to move these parts to Core module and frontend will just need to use it’s API.

Local Provider will need to execute a command to run its program. Therefore, the command and arguments will be defined, while the rest will be delegated to the super class.

Lifecycle:

A Local Provider is intended to run engines as an API Server (potentially using HTTP, socket, or gRPC).
Local Provider executes a command through CoreAPI (reducing the main process implementation from extensions, easy to port to other platforms such as mobile)
Main Process core module will run a watchdog and maintain the process
Since then, the app has been able to make requests and proxy through the Local Provider extension.
App terminates -> watchdog terminates the process.

Examples

class CortexProvider extends LocalProvider {
  async onLoad() {
     // The Run is implemented from the core module
     // then the spawn process will be maintained by the watchdog
     this.run("cortex", [ "start", "--port", "39291" ], { cwd: "./", env: { } })
  } 

  async loadModel() {
    // Can be a http request, socket or grpc
    this.post("/v1/model/start", { mode: "llama3.2" })
  }
}

https://drive.google.com/file/d/1lITgfqviqA5b0-etSGtU5wI8BS7_TXza/view?usp=sharing

Remove Provider

The same as discussions: Remote API Extension #3505
Remote extensions should work with autopopulating models, e.g. /models list.
We could not build hundreds model.json files manually.
The current extension framework is actually designed to handle this, it's just an implementation issue from extensions, which can be improved.
There was a hacky UI implementation where we pre-populated models, then disabled all of them until the API key was set. That should be a part of the extension, not the Jan app.

Extension builder still ships default available models.

 // Before
 override async onLoad(): Promise<void> {
   super.onLoad()
   // Register Settings (API Key, Endpoints)
   this.registerSettings(SETTINGS)
 	
   // Pre-populate models - persist model.json files
   // MODELS are model.json files that come with the extension.
   this.registerModels(MODELS)
 }

 // After
 override async onLoad(): Promise<void> {
   super.onLoad()
   // Register Settings (API Key, Endpoints)
   this.registerSettings(SETTINGS)
 	
   // Fetch models from provider models endpoint - just a simple fetch
   // Default to `/models`
   get('/models')
     .then((models) => {
         // Model builder will construct model template (aka preset)
 	// This operation builds Model DTOs that works with the app.
 	this.registerModels(this.modelBuilder.build(models))
     })
 }

Remote Provider Extension

Draw.io

https://drive.google.com/file/d/1pl9WjCzKl519keva85aHqUhx2u0onVf4/view?usp=sharing

Supported parameters?

Each provider works with different parameters, but they all share the same basic function with the current ones defined.
We've already supported transformPayload and transformResponse to adapt to these cases.

So users still see parameters consistent from model to model, but the magic happens behind the scenes, where the transformations are simplified under the hood.

/**
* transformPayload Example
* Tranform the payload before sending it to the inference endpoint.
* The new preview models such as o1-mini and o1-preview replaced max_tokens by max_completion_tokens parameter.
* Others do not.
*/
transformPayload = (payload: OpenAIPayloadType): OpenAIPayloadType => {
  // Transform the payload for preview models
  if (this.previewModels.includes(payload.model)) {
    const { max_tokens, ...params } = payload
    return { ...params, max_completion_tokens: max_tokens }
  }
  // Pass through for officialw models
  return payload
}

Decoration?
- We've currently hard-coded many provider metadata from Jan, which could cause issues with future installed extensions.
- The decoration should be done from the Extension Manifest (package.json).
- https://code.visualstudio.com/api/references/extension-manifest
```
{
  "name": "openai-extension",
  "displayName": "OpenAI Extension Provider",
  "icon": "https://openai.com/logo.png"
}
```
Just remove the hacky parts

Model Dropdown: It checks if the engine is nitro or others, filtering for local versus cloud sections. New local engines will be treated as remote engines (e.g. cortex.cpp). -> Filter by Extension type (class name or type, e.g. LocalOAIEngine vs RemoteOAIEngine).
All models from the cloud provider are disabled by default if no API key is set. What if I use a self-hosted endpoint without API key restrictions? Models available or not should be determined from the extensions, when there are no credentials to meet the requirements, it will result in an empty section, indicating no available models. When users input the API-Key from extension settings page, it will fetch model list automatically and cache. Users can also refresh the models list from there (should not fetch so many times, we are building a local-first application)
Application settings can be a bit confusing, with Model Providers and Core Extensions listed separately. Where do other extensions fit in?

Extension settings do not have a community or "others" section

Provider Interface and abstraction

Providers are scoped at engine operations, such as running engines, loading models...
- registerModels(models)
- run(commands, arguments, options)
- loadModel(model)
- unloadModel(model)
Core functions can be extended through extensions, not confined by providers, such as Hardware and UI.
- systemStatus()
- registerSettings()
- registerRibbon()
- registerView()

Registered models will be stored in an in-memory store, accessible from other extensions(ModelManager.instance().models). The same as settings. App and extensions can perform chat/completions requests with just model name, which means the registered model should be unique across extensions.

The core module also exposes extensive APIs, such as systemStatus so other extensions can access, there should be just one implementation of the logic supplied by extensions. Otherwise, it will merely be utilized within the extension, first come, first served.

The UI of the model should be aligned with the model object, minimize decorations (e.g. model icon), and avoid introducing various types of model DTOs.

Each Provider Extension should be a separate repo?

Extensions installation is a straightforward process that requires minimal effort.

There is no official way to install extensions from a GitHub repository URL. Users typically don't know how to package and install software from sources.
There should be a shortcut from the settings page that allows users to input the URL, pop up the extension repository details, and then install from there.
It would be helpful to provide a list of community extensions, allowing users to easily find the right extension for their specific use case without having to search.

dan-homebrew · 2024-10-15T16:09:57Z

@louis-jan We can start working on this refactor, and make adjustments on the edges. Thank you for the clear spec!

louis-jan · 2024-10-31T08:18:58Z

Alternative Path

Shifting to cortex.cpp - No more Jan provider extension

It was a blocker where each provider introduced its own API schema. For example, Anthropic does not include system messages inside the message array. Support for a new remote provider requires a new extension to be installed.
Each remote provider registers their own pre-defined models, which seems inefficient.

Thoughts?

Should those extensions be able to be combined into one with very minimal functioning for proxying, so users can just add their own provider? Ref feat: Support Qwen 2.5 models#23 (comment) for the idea.
How can we shift those to Cortex so it can share the same function with the BE and make Jan thinner and easier to deliver?

Ideas

Build on top of a popular config template so that users can share the config between Jan x Cortex and other ecosystems. Litellm for example has a configuration API that is really well-defined, so it's better to learn and build a function on top of that schema so that it can be used widely. Users can also edit the yaml file without requiring Jan up to be used (cortex support?)
Templating payload transform so that users or maintainers can easily add or update a new provider support (in seconds?)
By those two above, we can easily shift all the remote providers support to cortex easily.

See the diagram for the ideas' visualization (The red lines means optional APIs, which are not required, we don't want to introduce a complicated API set)

An example of a provider payload transformation template

Results

A dynamic yet simple remote provider support mechanism.
Lightweight Jan
A well-known configuration schema is used with provider model templating using Jinja so it can be configured easily from Jan but still scale the functionality from Cortex without requiring code additions.

dan-homebrew · 2024-10-31T14:01:15Z

@louis-jan @nguyenhoangthuan99 A random thought: is the correct long-term decision to build Remote APIs on the correct abstractions, i.e. Engines and Models? After reading the Jinja proposal above, I am worried that it is a hack that will simply introduce more complexity in the long term.

I am more in favor of aligning the Engines abstraction to match @louis-jan's earlier Provider abstraction. I am not as deep in this as you guys, but wanted to brainstorm a few ideas out loud:

From my naive perspective, we can have Engines that represent Remote APIs:

Transforming `/chat/completion` request into Remote API's format

@louis-jan has already articulated the transformPayload and transformResponse above
Engines representing Remote APIs transform the request and response from the API
- They handle API-specific param changes, e.g. turning on Claude's prompt caching
- They handle model-specific param changes, e.g. o1's model param changes

Getting Remote API's model list

/engines/models can return an engine's related models (e.g. to show in Jan's model dropdown)
For Remote API endpoints, this would be a remote call
We would still need to filter "major" models (e.g. not show every GPT-4 nightly model)

The Engine Interface we have designed can extend to handle this:

https://github.com/janhq/cortex.cpp/blob/dev/engine/cortex-common/EngineI.h

This would be a pain to do in C++, but I think with with a clear Interface spec and LLMs it may be doable. We will likely need to sacrifice a few Remote API extensions, and only focus on the main two (e.g. OpenAI, Anthropic).

More importantly, I think it's more important we pick the correct architecture, which Cortex Python will follow in the future.

nguyenhoangthuan99 · 2024-11-01T05:04:09Z

Remote Engines Specification

Remote Engine Architecture

Remote APIs are implemented as Engine abstractions, requiring a separate repository for remote API integration
Each remote engine manages specific remote model types (OpenAI, Anthropic, etc.)
- Each remote model corresponds to an engine type (e.g., OpenAI engine, Anthropic engine)
Remote engines must implement all EngineI interfaces for their respective models, e.g. OpenAI and Claude are 2 different engines but will be shipped in 1 binary? or we can separate it to 2 different binaries.
- Implementation follows the pattern established in llamacpp

Configuration Structure

Engine Settings

Each remote engine maintains its own settings.json. Example OpenAI configuration:

[
  {
    "key": "openai-api-key",
    "title": "API Key",
    "description": "The OpenAI API uses API keys for authentication. Visit your [API Keys](https://platform.openai.com/account/api-keys) page to retrieve the API key you'll use in your requests.",
    "controllerType": "input",
    "controllerProps": {
      "placeholder": "Insert API Key",
      "value": "",
      "type": "password",
      "inputActions": ["unobscure", "copy"]
    },
    "extensionName": "@janhq/inference-openai-extension"
  },
  {
    "key": "chat-completions-endpoint",
    "title": "Chat Completions Endpoint",
    "description": "The endpoint to use for chat completions. See the [OpenAI API documentation](https://platform.openai.com/docs/api-reference/chat/create) for more information.",
    "controllerType": "input",
    "controllerProps": {
      "placeholder": "https://api.openai.com/v1/chat/completions",
      "value": "https://api.openai.com/v1/chat/completions"
    },
    "extensionName": "@janhq/inference-openai-extension"
  }
]

Model Configuration

Each model requires a model.json file to manage chat completion parameters. Example:

{
  "sources": [
    {
      "url": "https://openai.com"
    }
  ],
  "id": "o1-mini",
  "object": "model",
  "name": "OpenAI o1-mini",
  "version": "1.0",
  "description": "OpenAI o1-mini is a lightweight reasoning model",
  "format": "api",
  "settings": {},
  "parameters": {
    "max_tokens": 4096,
    "temperature": 0.7,
    "top_p": 0.95,
    "stream": true,
    "stop": [],
    "frequency_penalty": 0,
    "presence_penalty": 0
  },
  "metadata": {
    "author": "OpenAI",
    "tags": ["General"]
  },
  "engine": "openai"
}

File System Structure

Configuration files should be stored alongside remote engines in the cortex directory:

cortexcpp/
      engines/
            llama-cpp/
            remote/
                   libengine.so
                   anthropic/
                   openai/
                         settings.json
                         o1-mini/
                              model.json
                         o1-preview/
                              model.json

Cortex-cpp Integration

Engine Management API

/engines/list: List available engines
/engines/install: Install new engines
/engines/uninstall: Remove engines
/engines/get: Get engine details

Model Management API

/v1/models: List all available models (both local and remote)
- Includes preset OpenAI models, and we can also allow user to add nightly models
- Allows user-added models with OpenAI remote engine
/engine/models: List all models associated with a specific engine

Chat Completion

Endpoint: /v1/chat/completion
Handles both local and remote model requests
Routes requests to appropriate engine based on model type and engine type

Implementation Notes

All remote model management is handled by their respective engines, not cortex.cpp
Current logic for local models in cortex.cpp remains unchanged

dan-homebrew · 2024-11-01T06:06:07Z

@nguyenhoangthuan99 As you work on Engines, I'd like you to have a perspective on how I see Engines as a larger abstraction long-term.

We have a very clear Engines abstraction, each of which have a separate repo
Engines may have multiple implementations (e.g. C++ binary, and Python in the future)
In the future, we may have a Python-based Extension framework that allows Python-based Engines to run on C++

Right now, we are focused on Cortex C++, and the output should be a C++ binary that is dynamically linked (i.e. how we do llama.cpp right now).

Due to the tediousness of C++, I recommend we just focus on 3 key engines for our users (i.e. OpenAI, Anthropic), and then provide a generic OpenAI Engine that takes in a API URL and model name. Imho, most API providers should already have adopted the OpenAI standard.

dan-homebrew · 2024-11-01T06:09:47Z

One thing I'd like to clarify though - from my POV, we should have different engines for each:

anthropic-engine
openai-engine
llamacpp-engine
tensorrtllm-engine

The Engine abstraction will need to be of type: local or type: remote.

My naive idea is that from Jan's perspective, Remote APIs are implemented in the following manner:

Call to /engine/models: retrieves Model Names and underlying model_id
Sets up Thread between Assistant, while passing selected Model and Engine in
- This may be different from how OpenAI implemented Assistants (i.e. fixed model)
Call to /threads/message transform message payload into API's format, calls Remote API's /chat/completions
Response from /threads/message transforms response back to ours, handles storage etc

@louis-jan @nguyenhoangthuan99 @namchuai @vansangpfiev - would love your feedback.

Remote Engines Specification

Remote Engine Architecture

Remote APIs are implemented as Engine abstractions, requiring a separate repository for remote API integration

Each remote engine manages specific remote model types (OpenAI, Anthropic, etc.)

nguyenhoangthuan99 · 2024-11-01T08:59:51Z

The Updated implementation of remote engine will be simplified like this:

Implement dylib for openai engine and Anthropic engine, and a generic engine for other openai compatible provider so cortex.cpp can load like the llamacpp engine
GetModels will call remote API and return info (model ID, name,...)
HandleChatCompletion (will transform request) and forward to remote provider.
Cortex.cpp will manage all api settings and model settings for remote provider

Update cortex.cpp:
The cortex.cpp needs updates to handle both remote and local providers:

API Key Management:

Will utilize the same approach as model_hub API (reference: planning: Migrate Hub to Cortex Model Hub API #3910)

Engine Management:

Remote and local engines will be treated uniformly
Support for all endpoint operations for remote engine:
- install
- uninstall
- get
- list

Model Management:

Remote and local models will be handled consistently using model.yml files.
Remote models are treated as extensions and will not be stored in the database, remote engine will manage its own models.
Support update params for each model through /v1/models/update/ api with remote engine.
When call /v1/models/, also return remote models and create model.yml if not exists.

Model storage structure:

Each remote provider's models will be stored under their respective engine folder
This ensures automatic clean up of model data when an engine is uninstalled

imtuyethan · 2024-11-04T10:52:46Z

This epic should solve:

louis-jan · 2024-11-05T03:53:34Z

@nguyenhoangthuan99

GetModels will call remote API and return info (model ID, name,...)

Remote and local models will be handled consistently using model.yml files

When call /v1/models/list, also return remote models and create model.yml if not exists.

Would this cause significant latency for /models? It would result in a poor user experience for clients. Also it's /v1/models no /list path component.

Remote models are treated as extensions and will not be stored in the database, remote engine will manage its own models

This would result in duplicate implementations between extensions. The current code-sharing mechanism between engine implementations is quite bad. My naive thought that you mean to scan thru the folder, but that introduce a bad performance as we tried to introduce the db file to optimize that. Otherwise, open interpreter would result in hundreds of model entries that cause a noticeable problem.

HandleChatCompletion (will transform request) and forward to remote provider

I think there should be a transformer for parameters to map to Jan UI and consistently persist model.yml.

Each remote provider's models will be stored under their respective engine folder

There was an interesting case where applications like Jan wanted to prepackage engines, making those engines read-only folders. Moving them to the data folder is costly and provides poor UX because there is a compressed mechanism in the app bundle. Decompressing and copying them over can take more than 5 minutes on some computers we have experienced so far.

POST /v1/auth/token
{
"provider": "huggingface",
"token": "your_token_here"
}

This builds a bad engine isolation where each extension can access others or Application have to map once again. Many parameters can be configured at the engine level, such as API key, URL, and settings for remote engines. For the local llama.cpp engine, options include caching, flash attention, and more. Would it be better to create a generic engine configurable endpoint for scalability?

nguyenhoangthuan99 · 2024-11-05T07:50:18Z

Updated cortex.cpp implementation base on Louis recommendation:

we will implement separate endpoint for get model corresponding to engines /v1/engine/models and support filter for model name with remote engine
Remote models will be saved in DB with different table RemoteModels and have follow fields: model, engine, path_to_model_yml -> we need to provide API endpoint for adding remote models, only added models are saved in DB
remote engine setting (api key, url, ...) will be saved under models/remote/ data folder, e.g. models/remote/openai.json, models/remote/anthropic.json, models/remote/openai-compatible.json... -> we need to provide API for engine setting /v1/engine/setting.
model.yml also contains mapping parameters fields for params transformer

dan-homebrew · 2024-11-05T11:50:44Z

Updated cortex.cpp implementation base on Louis recommendation:

we will implement separate endpoint for get model corresponding to engines /v1/engine/models and support filter for model name with remote engine

Remote models will be saved in DB with different table RemoteModels and have follow fields: model, engine, path_to_model_yml -> we need to provide API endpoint for adding remote models, only added models are saved in DB

remote engine setting (api key, url, ...) will be saved under models/remote/ data folder, e.g. models/remote/openai.json, models/remote/anthropic.json, models/remote/openai-compatible.json... -> we need to provide API for engine setting /v1/engine/setting.

model.yml also contains mapping parameters fields for params transformer

@nguyenhoangthuan99 @louis-jan I am not sure about this implementation and would like us to brainstorm/think through more:

Overall

I would like to explore @louis-jan idea of more code-sharing between engine implementations
One path I would like to explore is building a generic "Remote OpenAI-compatible Engine", and then letting users create instances of it.
- URL
- API Key
- Transform Params
We can probably incorporate elements of @louis-jan's proposal last week into the Engines abstraction

Models

We should have a clear Models abstraction, which can be either local or remote.

cortex.db Models Table

I don't think a separate RemoteModels table in cortex.db makes sense; we should use the existing models table
We should add a remote column which if true is a remote model
engine should be a 1:1 to the remote engine
Calling /models should return both local and remote, with field to indicate whether it's remote or local

Note: This will require us to implement a DB migrator as part of updater, which is an important App Shell primitive, as cortex.db does not have a remote column

getModels

One big question on my mind, is whether Models table should contain all remote models. What if OpenRouter returns all 700 models? What if Claude returns every claude-sonnet- version? This would clog up Models table and make it impossible to use.

We should only be showing the "main" models (e.g. 3-4 main ones)
We should still make it possible for the user to "define" a model they want to use, e.g. if they want to use claude-sonnet-10082024)

model.yaml has the params transformer

I really like this idea, and think this is a very elegant way to deal with this
We can host these in the Cortex Huggingface Org, as model.yaml for Remote models

Engines

I think remote engine settings should be stored in the /engines folder, not /models
e.g. /engines/anthropic, /engines/openai, /engines/<name>
Models belong to Engines, and there should be a SQLite relation between them

API Key and URL

I think Engines can have a generic settings.json in their /engines/<name> folder
Local and Remote Engines can share this abstraction

Generic OpenAI API-compatible Engine?

This is riffing off @louis-jan idea last week, of just having a Transform

We should allow users to create a new generic OpenAI-compatible API
- Creates /engines/<name> folder
- Creates engine entry in cortex.db's Engines table
- Creates /engines/<name>/settings
  - Takes in URL, API Key
  - Takes in Transform params (this can be overridden if model has model.yaml)

This would allow us to provision generic OpenAI-equivalent API Engines.

dan-homebrew · 2024-11-06T05:35:04Z

Concept:

cortex.db Engine
We "seed" Engines table with Remote Engines
Some of these Remote Engines have Models (e.g. OpenAI has o1 model)
- o1 model has a model.yaml that overrides the Engines' TransformParams
Note: Engine Metadata should be versioned

0xSage mentioned this issue Oct 14, 2024

planning: Jan and Cortex's Extension Framework #3773

Open

dan-homebrew changed the title ~~architecture: Local Provider Extension~~ architecture: Provider Abstraction Oct 14, 2024

dan-homebrew assigned louis-jan Oct 14, 2024

0xSage added category: local engines category: providers Local & remote inference providers labels Oct 14, 2024

0xSage pinned this issue Oct 14, 2024

louis-jan mentioned this issue Oct 17, 2024

feat: Jan Integrates Cortex.cpp as Provider #3821

Merged

14 tasks

dan-homebrew changed the title ~~architecture: Provider Abstraction~~ discussion: Provider Abstraction Oct 17, 2024

0xSage added the type: epic A major feature or initiative label Oct 17, 2024

0xSage mentioned this issue Oct 17, 2024

epic: Provider Refactor + Extensions #3824

Closed

14 tasks

0xSage removed the category: local providers label Oct 17, 2024

0xSage changed the title ~~discussion: Provider Abstraction~~ planning: Provider Abstraction Oct 17, 2024

0xSage added type: planning Discussions, specs and decisions stage and removed type: epic A major feature or initiative labels Oct 17, 2024

imtuyethan mentioned this issue Oct 18, 2024

chore: Structure Icebox in Github Projects #3840

Open

dan-homebrew mentioned this issue Oct 22, 2024

idea: Request for more cloud providers (Google AI Studio, Together API, AI Horde) #3859

Closed

dan-homebrew changed the title ~~planning: Provider Abstraction~~ planning: Remote API Extensions for Jan & Cortex Oct 29, 2024

dan-homebrew assigned nguyenhoangthuan99 Oct 30, 2024

louis-jan mentioned this issue Nov 4, 2024

bug: Unsupported Parameter Errors for model o1-mini & o1-preview #3771

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

planning: Remote API Extensions for Jan & Cortex #3786

planning: Remote API Extensions for Jan & Cortex #3786

dan-homebrew commented Oct 13, 2024 •

edited

Loading

dan-homebrew commented Oct 14, 2024 •

edited

Loading

louis-jan commented Oct 14, 2024 •

edited

Loading

dan-homebrew commented Oct 15, 2024

louis-jan commented Oct 31, 2024

dan-homebrew commented Oct 31, 2024 •

edited

Loading

nguyenhoangthuan99 commented Nov 1, 2024 •

edited

Loading

dan-homebrew commented Nov 1, 2024 •

edited

Loading

dan-homebrew commented Nov 1, 2024 •

edited

Loading

Remote Engines Specification

Remote Engine Architecture

nguyenhoangthuan99 commented Nov 1, 2024 •

edited

Loading

imtuyethan commented Nov 4, 2024 •

edited

Loading

louis-jan commented Nov 5, 2024 •

edited by dan-homebrew

Loading

nguyenhoangthuan99 commented Nov 5, 2024 •

edited

Loading

dan-homebrew commented Nov 5, 2024

dan-homebrew commented Nov 6, 2024 •

edited

Loading

planning: Remote API Extensions for Jan & Cortex #3786

planning: Remote API Extensions for Jan & Cortex #3786

Comments

dan-homebrew commented Oct 13, 2024 • edited Loading

Goal

Tasklist

Remote APIs to Support

dan-homebrew commented Oct 14, 2024 • edited Loading

Scope

Related

louis-jan commented Oct 14, 2024 • edited Loading

Jan Providers

Local Provider

Remove Provider

Provider Interface and abstraction

Each Provider Extension should be a separate repo?

dan-homebrew commented Oct 15, 2024

louis-jan commented Oct 31, 2024

Alternative Path

Shifting to cortex.cpp - No more Jan provider extension

Thoughts?

Ideas

Results

dan-homebrew commented Oct 31, 2024 • edited Loading

Transforming /chat/completion request into Remote API's format

Getting Remote API's model list

nguyenhoangthuan99 commented Nov 1, 2024 • edited Loading

Remote Engines Specification

Remote Engine Architecture

Configuration Structure

Engine Settings

Model Configuration

File System Structure

Cortex-cpp Integration

Engine Management API

Model Management API

Chat Completion

Implementation Notes

dan-homebrew commented Nov 1, 2024 • edited Loading

dan-homebrew commented Nov 1, 2024 • edited Loading

Remote Engines Specification

Remote Engine Architecture

nguyenhoangthuan99 commented Nov 1, 2024 • edited Loading

API Key Management:

Engine Management:

Model Management:

imtuyethan commented Nov 4, 2024 • edited Loading

louis-jan commented Nov 5, 2024 • edited by dan-homebrew Loading

nguyenhoangthuan99 commented Nov 5, 2024 • edited Loading

dan-homebrew commented Nov 5, 2024

Overall

Models

Engines

Generic OpenAI API-compatible Engine?

dan-homebrew commented Nov 6, 2024 • edited Loading

dan-homebrew commented Oct 13, 2024 •

edited

Loading

dan-homebrew commented Oct 14, 2024 •

edited

Loading

louis-jan commented Oct 14, 2024 •

edited

Loading

dan-homebrew commented Oct 31, 2024 •

edited

Loading

Transforming `/chat/completion` request into Remote API's format

nguyenhoangthuan99 commented Nov 1, 2024 •

edited

Loading

dan-homebrew commented Nov 1, 2024 •

edited

Loading

dan-homebrew commented Nov 1, 2024 •

edited

Loading

nguyenhoangthuan99 commented Nov 1, 2024 •

edited

Loading

imtuyethan commented Nov 4, 2024 •

edited

Loading

louis-jan commented Nov 5, 2024 •

edited by dan-homebrew

Loading

nguyenhoangthuan99 commented Nov 5, 2024 •

edited

Loading

dan-homebrew commented Nov 6, 2024 •

edited

Loading