Update Deployment/Kubernetes/TensorRT-LLM_Autoscaling_and_Load_Balanc…

…ing/README.md Co-authored-by: Neelay Shah <[email protected]>
triton-inference-server · Jun 12, 2024 · ae4a292 · ae4a292
1 parent 70d533a
commit ae4a292
Showing 1 changed file with 1 addition and 1 deletion.
diff --git a/Deployment/Kubernetes/TensorRT-LLM_Autoscaling_and_Load_Balancing/README.md b/Deployment/Kubernetes/TensorRT-LLM_Autoscaling_and_Load_Balancing/README.md
@@ -16,7 +16,7 @@
 
 # Autoscaling and Load Balancing Generative AI w/ Triton Server and TensorRT-LLM
 
-Setting up autoscaling and load balancing using Triton Inference Server, TensorRT-LLM or vLLM, and Kubernetes is not difficult,
+Setting up autoscaling and load balancing for large language models served by Triton Inference Server is not difficult,
 but it does require preparation.
 
 This guide aims to help you automated acquisition of models from Hugging Face, minimize time spent optimizing models for