Skip to content

OpenVINO™ Model Server 2024.5

Latest
Compare
Choose a tag to compare
@atobiszei atobiszei released this 20 Nov 14:08
· 3 commits to releases/2024/5 since this release
3c284cf

The 2024.5 release comes with support for embedding and rerank endpoints, as well as experimental Windows support version.

Changes and improvements

  • The OpenAI API text embedding endpoint has been added, enabling OVMS to be used as a building block for AI applications like RAG.

  • The rerank endpoint has been added based on Cohere API, enabling easy similarity detection between a query and a set of documents. It is one of the building blocks for AI applications like RAG and makes integration with frameworks such as langchain easy.

  • The echo sampling parameter together with logprobs in the completions endpoint is now supported.

  • Performance increase on both CPU and GPU for LLM text generation.

  • LLM dynamic_split_fuse for GPU target device boosts throughput in high-concurrency scenarios.

  • The procedure for LLM service deployment and model repository preparation has been simplified.

  • Improvements in LLM tests coverage and stability.

  • Instructions how to build experimental version of a Windows binary package - native model server for Windows OS – is available. This release includes a set of limitations and has limited tests coverage. It is intended for testing, while the production-ready release is expected with 2025.0. All feedback is welcome.

  • OpenVINO Model Server C-API now supports asynchronous inference, improves performance with ability of setting outputs, enables using OpenCL & VA surfaces on both inputs & outputs for GPU target device's

  • KServe REST API Model_metadata endpoint can now provide additional model_info references.

  • Included support for NPU and iGPU on MTL and LNL platforms

  • Security and stability improvements

Breaking changes

No breaking changes.

Bug fixes:

  • Fix support for url encoded model name for KServe REST API
  • OpenAI text generation endpoints now accepts requests with both v3 & v3/v1 path prefix
  • Fix reporting metrics in video stream benchmark client
  • Fix sporadic INVALID_ARGUMENT error on completions endpoint
  • Fix incorrect LLM finish reason when expecting stop but got length

Discontinuation plans

In the future release, support for the following build options will not be maintained:

  • Ubuntu 20 as the base image
  • OpenVINO NVIDIA plugin

You can use an OpenVINO Model Server public Docker images based on Ubuntu22.04 via the following command:

  • docker pull openvino/model_server:2024.5 - CPU device support
  • docker pull openvino/model_server:2024.5-gpu - GPU, NPU and CPU device support

or use provided binary packages.
The prebuilt image is available also on RedHat Ecosystem Catalog