Skip to content

Commit

Permalink
Update Deployment/Kubernetes/TensorRT-LLM_Autoscaling_and_Load_Balanc…
Browse files Browse the repository at this point in the history
…ing/README.md

Co-authored-by: Neelay Shah <[email protected]>
  • Loading branch information
whoisj and nnshah1 authored Jun 12, 2024
1 parent 70d533a commit ae4a292
Showing 1 changed file with 1 addition and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@

# Autoscaling and Load Balancing Generative AI w/ Triton Server and TensorRT-LLM

Setting up autoscaling and load balancing using Triton Inference Server, TensorRT-LLM or vLLM, and Kubernetes is not difficult,
Setting up autoscaling and load balancing for large language models served by Triton Inference Server is not difficult,
but it does require preparation.

This guide aims to help you automated acquisition of models from Hugging Face, minimize time spent optimizing models for
Expand Down

0 comments on commit ae4a292

Please sign in to comment.