Skip to content

DJLServing v0.29.0 Release

Latest
Compare
Choose a tag to compare
@siddvenk siddvenk released this 16 Aug 15:57
· 121 commits to master since this release
c343d60

Key Features

Details regarding the latest LMI container image_uris can be found here

DJL Serving Changes (applicable to all containers)

  • Allows configuring health checks to fail based on various types of error rates
  • When not streaming responses, all invocation errors will respond with the appropriate 4xx or 5xx HTTP response code
    • Previously, for some inference backends (vllm, lmi-dist, tensorrt-llm) the behavior was to return 2xx HTTP responses when errors occurred during inference
  • HTTP Response Codes are now configurable if you require a specific 4xx or 5xx status to be returned in certain situations
  • Introduced annotations @input_formatter and @output_formatter to bring your own script for pre- and post-postprocessing.

LMI Container (vllm, lmi-dist)

  • vLLM updated to version 0.5.3.post1
  • Added MultiModal Support for Vision Language Models using the OpenAI Chat Completions Schema.
    • More details available here
  • Supports Llama 3.1 models
  • Supports beam search, best_of and n with non streaming output.
  • Supports chunked prefill support in both vllm and lmi-dist.

TensorRT-LLM Container

  • TensorRT-LLM updated to version 0.11.0
  • [Breaking change] Flan-T5 is now supported with C++ triton backend. Removed Flan-T5 support for TRTLLM python backend.

Transformers NeuronX Container

  • Upgraded to Transformers NeuronX 2.19.1

Text Embedding (using the LMI container)

  • Various performance improvements

Enhancements

Documentation

CI + Testing

Bug Fixes

New Contributors

Full Changelog: v0.28.0...v0.29.0