Skip to content

Releases: distantmagic/paddler

v0.7.0

04 Sep 01:29
Compare
Choose a tag to compare

Requires at least b3606 llama.cpp release.

Breaking Changes

  • Adjusted to handle breaking changes in llama.cpp /health endpoint: ggml-org/llama.cpp#9056

    Instead of using the /health endpoint to monitor slot statuses, starting from this version, Paddler uses the /slots endpoint to monitor llama.cpp instances.
    Paddler's /health endpoint remains unchanged.

v0.6.0

13 Aug 09:54
Compare
Choose a tag to compare

Latest supported llama.cpp release: b3604

Features

Fixes

  • Agent host formatting in dashboard

paddler-name

v0.6.0-rc1

12 Aug 12:18
Compare
Choose a tag to compare
v0.6.0-rc1 Pre-release
Pre-release

Features

  • Assign names to Paddler agents (#15)

v0.5.0

17 Jul 21:31
Compare
Choose a tag to compare

Fixes

  • Management server crashed in some scenarios due to concurrency issues

v0.4.0

16 Jul 12:33
Compare
Choose a tag to compare

Thank you, @ScottMcNaught, for the help with debugging the issues! :)

Fixes

  • OpenAI compatible endpoint is now properly balanced (/v1/chat/completions)
  • Balancer's reverse proxy panicked in some scenarios when the underlying llama.cpp instance was abruptly closed during the generation of completion tokens
  • Added mutex in the targets collection for better internal slots data integrity

v0.3.0

27 Jun 21:08
Compare
Choose a tag to compare

Features

  • Requests can queue when all llama.cpp instances are busy
  • AWS Metadata support for agent local IP address
  • StatsD metrics support

image

v0.1.0

01 Jun 23:20
Compare
Choose a tag to compare

Aggregated Health Status Responses

Paddler aggregates all the underlying llama.cpp health statuses. When you check the /health endpoint, it reports aggregated results, making it a drop-in replacement for the llama.cpp server itself (in a sense that you can start making requests to Paddler instead of llama.cpp and things will work the same way).

paddler_aggregate