Releases: mostlygeek/llama-swap
Releases · mostlygeek/llama-swap
v96
v95
v94
v93
This release includes an important fix for an important but rarely occurring bug 🐛. There was a race condition that could result in some requests fail rather than wait for the upstream server to start.
Changelog
v92
v91
This release adds the /unload
that allows for manually unloading all currently loaded models.
Changelog
- 082d5d0 Add /unload endpoint (#58) to unload all currently running models
- 5333893 increase health check to a minimum of 5 seconds
- af65334 Update README.md w/ starhistory graph
- 1e25b44 add workflow_dispatch to release action
- 0815bb4 Add windows to goreleaser #54
- 7187cfe add Windows build support to Makefile (#54)
- 24089d2 remove "no musa container" note from README
- ebabe55 Delete untagged packages after build and push (#55)
- 41a3382 deletion of untagged containers happen after build-and-push
- 7e3353e add action step to remove untagged containers
- 4ed58fb update container build action
- f5a2be6 revert package src until new ggml-org has them
- f5e6ec3 fix package src in containerfile
- 3f462da switch package source from ggerganov to ggml-org
- 48bd766 Update README.md
- 8d319da improve README organization (i think...)
- be7c502 improve docs
- 92336f0 more container build fixes
- ed2a50d fix bug in build-container.sh
- 0acfdb9 update workflow to build
cpu
and disablemusa
- 96a8ea0 add cpu docker container build
- f20f2c9 add docs and container build improvements #43
- 7a97c38 enable parallel container built #46
- 4885132 more permissions futzing
- 8b46a0b grant package:write to container workflow #46
- 1b6736e rename workflow for containers
- ddc1ce0 fix container file name #46
- 11d024b just build cuda while debugging
- 43e23c1 add check for GITHUB_TOKEN #46
- f9c8e76 add execute bit on build-container.sh
- d7e1bb9 add GITHUB_TOKEN to container build env
- ab93460 first container code (#52)
v90
Note
The only functionality change of this release is that health checks has been increased from 1 second to 5 seconds. Everything else is documentation, Github Action and docker additions to make llama-swap more convenient to deploy.
Changelog
- cebf9c4 increase health check to a minimum of 5 seconds
- 1e25b44 add workflow_dispatch to release action
- 0815bb4 Add windows to goreleaser #54
- 7187cfe add Windows build support to Makefile (#54)
- 24089d2 remove "no musa container" note from README
- ebabe55 Delete untagged packages after build and push (#55)
- 41a3382 deletion of untagged containers happen after build-and-push
- 7e3353e add action step to remove untagged containers
- 4ed58fb update container build action
- f5a2be6 revert package src until new ggml-org has them
- f5e6ec3 fix package src in containerfile
- 3f462da switch package source from ggerganov to ggml-org
- 48bd766 Update README.md
- 8d319da improve README organization (i think...)
- be7c502 improve docs
- 92336f0 more container build fixes
- ed2a50d fix bug in build-container.sh
- 0acfdb9 update workflow to build
cpu
and disablemusa
- 96a8ea0 add cpu docker container build
- f20f2c9 add docs and container build improvements #43
- 7a97c38 enable parallel container built #46
- 4885132 more permissions futzing
- 8b46a0b grant package:write to container workflow #46
- 1b6736e rename workflow for containers
- ddc1ce0 fix container file name #46
- 11d024b just build cuda while debugging
- 43e23c1 add check for GITHUB_TOKEN #46
- f9c8e76 add execute bit on build-container.sh
- d7e1bb9 add GITHUB_TOKEN to container build env
- ab93460 first container code (#52)