-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
planning: Remote API Extensions for Jan & Cortex #3786
Comments
Goal: Clear Eng Spec for Providers Scope
Related |
Jan ProvidersLocal ProviderCurrently, the local extension still has to manage processes itself, which involves utilizing third-party frameworks such as Node.js (child_process) for building functions. What if we build Jan on mobile we have to cover extensions as well. It would be better to move these parts to Core module and frontend will just need to use it’s API. Local Provider will need to execute a command to run its program. Therefore, the command and arguments will be defined, while the rest will be delegated to the super class. Lifecycle:
Examples class CortexProvider extends LocalProvider {
async onLoad() {
// The Run is implemented from the core module
// then the spawn process will be maintained by the watchdog
this.run("cortex", [ "start", "--port", "39291" ], { cwd: "./", env: { } })
}
async loadModel() {
// Can be a http request, socket or grpc
this.post("/v1/model/start", { mode: "llama3.2" })
}
}
Remove Provider
Draw.iohttps://drive.google.com/file/d/1pl9WjCzKl519keva85aHqUhx2u0onVf4/view?usp=sharing
Provider Interface and abstraction
Registered models will be stored in an in-memory store, accessible from other extensions( The core module also exposes extensive APIs, such as The UI of the model should be aligned with the model object, minimize decorations (e.g. model icon), and avoid introducing various types of model DTOs. Each Provider Extension should be a separate repo?Extensions installation is a straightforward process that requires minimal effort.
|
@louis-jan We can start working on this refactor, and make adjustments on the edges. Thank you for the clear spec! |
Alternative PathShifting to cortex.cpp - No more Jan provider extension
Thoughts?
Ideas
See the diagram for the ideas' visualization (The red lines means optional APIs, which are not required, we don't want to introduce a complicated API set) An example of a provider payload transformation template Results
|
@louis-jan @nguyenhoangthuan99 A random thought: is the correct long-term decision to build Remote APIs on the correct abstractions, i.e. Engines and Models? After reading the Jinja proposal above, I am worried that it is a hack that will simply introduce more complexity in the long term. I am more in favor of aligning the Engines abstraction to match @louis-jan's earlier Provider abstraction. I am not as deep in this as you guys, but wanted to brainstorm a few ideas out loud: From my naive perspective, we can have Engines that represent Remote APIs: Transforming
|
Remote Engines SpecificationRemote Engine Architecture
Configuration StructureEngine SettingsEach remote engine maintains its own [
{
"key": "openai-api-key",
"title": "API Key",
"description": "The OpenAI API uses API keys for authentication. Visit your [API Keys](https://platform.openai.com/account/api-keys) page to retrieve the API key you'll use in your requests.",
"controllerType": "input",
"controllerProps": {
"placeholder": "Insert API Key",
"value": "",
"type": "password",
"inputActions": ["unobscure", "copy"]
},
"extensionName": "@janhq/inference-openai-extension"
},
{
"key": "chat-completions-endpoint",
"title": "Chat Completions Endpoint",
"description": "The endpoint to use for chat completions. See the [OpenAI API documentation](https://platform.openai.com/docs/api-reference/chat/create) for more information.",
"controllerType": "input",
"controllerProps": {
"placeholder": "https://api.openai.com/v1/chat/completions",
"value": "https://api.openai.com/v1/chat/completions"
},
"extensionName": "@janhq/inference-openai-extension"
}
] Model ConfigurationEach model requires a {
"sources": [
{
"url": "https://openai.com"
}
],
"id": "o1-mini",
"object": "model",
"name": "OpenAI o1-mini",
"version": "1.0",
"description": "OpenAI o1-mini is a lightweight reasoning model",
"format": "api",
"settings": {},
"parameters": {
"max_tokens": 4096,
"temperature": 0.7,
"top_p": 0.95,
"stream": true,
"stop": [],
"frequency_penalty": 0,
"presence_penalty": 0
},
"metadata": {
"author": "OpenAI",
"tags": ["General"]
},
"engine": "openai"
} File System StructureConfiguration files should be stored alongside remote engines in the cortex directory:
Cortex-cpp IntegrationEngine Management API
Model Management API
Chat Completion
Implementation Notes
|
@nguyenhoangthuan99 As you work on Engines, I'd like you to have a perspective on how I see Engines as a larger abstraction long-term.
Right now, we are focused on Cortex C++, and the output should be a C++ binary that is dynamically linked (i.e. how we do llama.cpp right now). Due to the tediousness of C++, I recommend we just focus on 3 key engines for our users (i.e. OpenAI, Anthropic), and then provide a generic OpenAI Engine that takes in a API URL and model name. Imho, most API providers should already have adopted the OpenAI standard. |
One thing I'd like to clarify though - from my POV, we should have different engines for each:
The Engine abstraction will need to be of My naive idea is that from Jan's perspective, Remote APIs are implemented in the following manner:
@louis-jan @nguyenhoangthuan99 @namchuai @vansangpfiev - would love your feedback.
|
The Updated implementation of remote engine will be simplified like this:
Update cortex.cpp: API Key Management:
Engine Management:
Model Management:
Model storage structure:
|
Would this cause significant latency for /models? It would result in a poor user experience for clients. Also it's
This would result in duplicate implementations between extensions. The current code-sharing mechanism between engine implementations is quite bad. My naive thought that you mean to scan thru the folder, but that introduce a bad performance as we tried to introduce the db file to optimize that. Otherwise,
I think there should be a transformer for parameters to map to Jan UI and consistently persist model.yml.
There was an interesting case where applications like Jan wanted to prepackage engines, making those engines
This builds a bad engine isolation where each extension can access others or Application have to map once again. Many parameters can be configured at the engine level, such as API key, URL, and settings for remote engines. For the local llama.cpp engine, options include caching, flash attention, and more. Would it be better to create a generic engine configurable endpoint for scalability? |
Updated
|
@nguyenhoangthuan99 @louis-jan I am not sure about this implementation and would like us to brainstorm/think through more: Overall
ModelsWe should have a clear Models abstraction, which can be either local or remote.
Note: This will require us to implement a DB migrator as part of updater, which is an important App Shell primitive, as
One big question on my mind, is whether Models table should contain all remote models. What if OpenRouter returns all 700 models? What if Claude returns every claude-sonnet- version? This would clog up Models table and make it impossible to use.
Engines
API Key and URL
Generic OpenAI API-compatible Engine?
This would allow us to provision generic OpenAI-equivalent API Engines. |
Goal
/chat/completion
requests to Remote APIs/chat/completion
/chat/completions
then routes conditionallyTasklist
Remote APIs to Support
Popular
Deprioritized
The text was updated successfully, but these errors were encountered: