-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add TRT-LLM Gen. AI Autoscaling & Load Balancing Guide #95
Add TRT-LLM Gen. AI Autoscaling & Load Balancing Guide #95
Conversation
Deployment/Kubernetes/TensorRT-LLM_Autoscaling_and_Load_Balancing/containers/server.py
Fixed
Show fixed
Hide fixed
Deployment/Kubernetes/TensorRT-LLM_Autoscaling_and_Load_Balancing/containers/server.py
Fixed
Show fixed
Hide fixed
Deployment/Kubernetes/TensorRT-LLM_Autoscaling_and_Load_Balancing/containers/server.py
Fixed
Show fixed
Hide fixed
Deployment/Kubernetes/TensorRT-LLM_Autoscaling_and_Load_Balancing/containers/server.py
Fixed
Show fixed
Hide fixed
Deployment/Kubernetes/TensorRT-LLM_Autoscaling_and_Load_Balancing/containers/server.py
Fixed
Show fixed
Hide fixed
Deployment/Kubernetes/TensorRT-LLM_Autoscaling_and_Load_Balancing/containers/client.py
Fixed
Show fixed
Hide fixed
7f29e69
to
2f32aa1
Compare
Deployment/Kubernetes/TensorRT-LLM_Autoscaling_and_Load_Balancing/containers/server.py
Fixed
Show fixed
Hide fixed
2f32aa1
to
f8a1c7d
Compare
32213c7
to
8623def
Compare
Deployment/Kubernetes/TensorRT-LLM_Autoscaling_and_Load_Balancing/containers/server.py
Fixed
Show fixed
Hide fixed
8623def
to
dc5fdd7
Compare
Deployment/Kubernetes/TensorRT-LLM_Autoscaling_and_Load_Balancing/containers/server.py
Fixed
Show fixed
Hide fixed
db38d43
to
beddaf9
Compare
Deployment/Kubernetes/TensorRT-LLM_Autoscaling_and_Load_Balancing/containers/server.py
Fixed
Show resolved
Hide resolved
103087d
to
e34523d
Compare
Deployment/Kubernetes/TensorRT-LLM_Autoscaling_and_Load_Balancing/README.md
Outdated
Show resolved
Hide resolved
5d516f3
to
01c0842
Compare
This change inlcudes a number of improvements suggested by @nealvaidya.
01c0842
to
70d533a
Compare
Deployment/Kubernetes/TensorRT-LLM_Autoscaling_and_Load_Balancing/README.md
Outdated
Show resolved
Hide resolved
Deployment/Kubernetes/TensorRT-LLM_Autoscaling_and_Load_Balancing/README.md
Outdated
Show resolved
Hide resolved
Deployment/Kubernetes/TensorRT-LLM_Autoscaling_and_Load_Balancing/README.md
Outdated
Show resolved
Hide resolved
Deployment/Kubernetes/TensorRT-LLM_Autoscaling_and_Load_Balancing/README.md
Outdated
Show resolved
Hide resolved
Deployment/Kubernetes/TensorRT-LLM_Autoscaling_and_Load_Balancing/README.md
Outdated
Show resolved
Hide resolved
Deployment/Kubernetes/TensorRT-LLM_Autoscaling_and_Load_Balancing/README.md
Outdated
Show resolved
Hide resolved
Deployment/Kubernetes/TensorRT-LLM_Autoscaling_and_Load_Balancing/README.md
Outdated
Show resolved
Hide resolved
Deployment/Kubernetes/TensorRT-LLM_Autoscaling_and_Load_Balancing/README.md
Outdated
Show resolved
Hide resolved
Deployment/Kubernetes/TensorRT-LLM_Autoscaling_and_Load_Balancing/README.md
Outdated
Show resolved
Hide resolved
Deployment/Kubernetes/TensorRT-LLM_Autoscaling_and_Load_Balancing/README.md
Outdated
Show resolved
Hide resolved
Deployment/Kubernetes/TensorRT-LLM_Autoscaling_and_Load_Balancing/README.md
Outdated
Show resolved
Hide resolved
Deployment/Kubernetes/TensorRT-LLM_Autoscaling_and_Load_Balancing/README.md
Outdated
Show resolved
Hide resolved
Deployment/Kubernetes/TensorRT-LLM_Autoscaling_and_Load_Balancing/README.md
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some sentence level suggestions - could use some additional eyes to catch any other grammer / syntax errors - but overall looks great! We can continue refining in future iterations.
@harryskim - would be good to get your quick review.
This change inlcudes a number of improvements suggested by @nnshah1. Co-authored-by: Neelay Shah <[email protected]>
9a637d2
to
dceba28
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving the applied changes requested by Neelay.
This change adds a guide for deploying autoscaling & load balancing of TensorRT-LLM Gen. AI models.
Includes: