How to use GPUs from an HPC System for LibreChat #4250
dirkpetersen
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
In our LibreChat Enterprise Pilot Project we use AWS Bedrock and are happy with that. However, we wanted to benchmark it against several locally hosted LLama 3.1 and you need about 6 A100 GPUs with 80GB each for the big 405B model ...... which we have only in our HPC cluster ..... and that only supports batch jobs. We came up with a way to serve llama-cpp-python and use Traefik as a load balancer with HA and packaged it all up in an easy to use process https://github.com/dirkpetersen/forever-slurm (happy to accept PRs for improvements) . There is also a background story . The 405B model is significantly slower than on Bedrock but the 70B model running on a single A40 GPU offers the same performance as Bedrock. To our great surprise this is actually very stable
Beta Was this translation helpful? Give feedback.
All reactions