Dynamic Creation of chat completion endpoints during runtime #6811
-
Hi! I currently use this code to initialize my semantic kernel:
Which is great and it works to inject the kernel. As you can see, I connect to ollama and currently I have the phi3 model hardcoded during the creation of the kernel. Now i want to be able to switch the models dynamically during runtime. A user of my app has the possiblity to pull ollama models and the new model should be used when calling:
the kernel like this. I would like to be able to add and remove chatcompletion services, so the kernel configuration is in sync with my running ollama server. I thought maybe i can do something like: kernel.AddChatCompletion and kernel.RemoveChatCompletion during runtime. But now I think there is issue in understanding the semantic kernel on my side and I think I am a little bit lost. Maybe someone can get me back on track with an idea how to implement this feature. Thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 2 replies
-
Hi @Maniga , in order to choose model in runtime, you need to pre-register possible models during kernel registration and then choose the model before invoking a kernel. This example shows how to achieve such behavior: If you don't know which models may be potentially used, during runtime you can check if kernel has a service with registered model and if not - add it and use it. You can also use Let me know if this example helps you to achieve your scenario. Thanks a lot! |
Beta Was this translation helpful? Give feedback.
-
Hi @dmytrostruk! Thanks for you answer. I saw that samples before and tried to make sense of it with my scenario. But my problem is, I tried to create a new instance with another modelname of an OpenAiChatCompletion service with that code:
but i dont get it how to add the new chat completion service to the kernel. There is no method like Add on the services list. |
Beta Was this translation helpful? Give feedback.
-
Here is my solution thanks to @dmytrostruk, maybe someone needs to implement a similar feature: I created a SemanticKernalManager Service which is responsible for initializing the semantic kernel every time a user pulls or removes a model in Ollama. I use clean architecture, so a domain event is triggered and the handler for the domain event calls the Initialization function. Also the manager service holds the kernel instance so i can use it in other places throughout the application. With that solution I don't have to initialize the kernel before each call. public class SemanticKernelManager : ISemanticKernelManager
{
private readonly IConfiguration _configuration;
private readonly IOllamaService _ollamaService;
public SemanticKernelManager(IOllamaService ollamaService, IConfiguration configuration)
{
_ollamaService = ollamaService;
_configuration = configuration;
Initialize();
}
public Kernel Kernel { get; private set; } = null!;
public void OllamaModelChanged()
{
Initialize();
}
private void Initialize()
{
var ollamaEndpoint = _configuration.GetValue<string>("services:ollama:ollama:0") ?? "http://localhost:11434";
var endpoint =
new Uri(ollamaEndpoint ?? throw new InvalidOperationException("The ollama endpoint has not been set."));
var modelsResult = _ollamaService.GetLoadedModels().Result;
var models = modelsResult.Models.Select(x => x.Name).ToList();
var kernelBuilder = Kernel.CreateBuilder();
#pragma warning disable SKEXP0010
foreach (var model in models) kernelBuilder.Services.AddOpenAIChatCompletion(model, endpoint);
#pragma warning restore SKEXP0010
var promptsFolderPath = Path.Combine(AppContext.BaseDirectory, "AiService", "Prompts\\ChatCompletionPlugin");
kernelBuilder.Plugins.AddFromPromptDirectory(promptsFolderPath);
Kernel = kernelBuilder.Build();
}
} |
Beta Was this translation helpful? Give feedback.
@Maniga Got it, thanks for providing more details. I think service registration is available only during kernel construction (e.g. when using
KernelBuilder
ornew Kernel(services)
. In your case, before making a request, you can create new kernel instance, import necessary plugins and services (e.g. OpenAIChatCompletionService withmodelName
provided by user) and execute a request. Let me know if that works for your scenario.