-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add deployment options in the '/doc' #98
base: main
Are you sure you want to change the base?
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
- [GKE](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples/gke#inference-examples) | ||
- [Cloud Run](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples/cloud-run#inference-examples) (preview) | ||
|
||
### From the Hub Model Page |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
### From the Hub Model Page | |
### From the Hub |
|
||
For inference, we have a general-purpose PyTorch inference DLC, for serving models trained with any of those frameworks mentioned before on both CPU and GPU. There is also the Text Generation Inference (TGI) DLC for high-performance text generation of LLMs on both GPU and TPU. Finally, there is a Text Embeddings Inference (TEI) DLC for high-performance serving of embedding models on both CPU and GPU. | ||
### From Vertex AI Model Garden |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would move this section up, between DLCs and Hub
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love it! For the quick screencast videos, I recommend the screen.studio app (presenter webcam overlay is optional, can be toggled off)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @pagezyhf! 🤗
@@ -1,5 +1,11 @@ | |||
# Introduction | |||
|
|||
[Hugging Face Deep Learning Containers for Google Cloud](https://cloud.google.com/deep-learning-containers/docs/choosing-container#hugging-face) are a set of Docker images for training and deploying Transformers, Sentence Transformers, and Diffusers models on Google Cloud Vertex AI and Google Kubernetes Engine (GKE). | |||
Hugging Face built Deep Learning Containers (DLCs) for Google Cloud customers to run any of their machine learning workload in an optimized environment, with no configuration or maintenance on their part. These are Docker images pre-installed with deep learning frameworks and libraries such as 🤗 Transformers, 🤗 Datasets, and 🤗 Tokenizers. The DLCs allow you to directly serve and train any models, skipping the complicated process of building and optimizing your serving and training environments from scratch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we should mention TGI and TEI too right? We can phrase it as the following (but with better wording)
"DLCs are Docker images pre-installed with deep learning solutions such as TGI and TEI for inference; or frameworks as Transformers for both training and inference."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't use "🤗 Transformers" emojis anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI those are not frameworks we have libraries (transformers) and solutions (TGI)
- [Google Kubernetes Engine](https://cloud.google.com/kubernetes-engine/docs) (GKE): GKE is a fully-managed Kubernetes service in Google Cloud that can be used to deploy and operate containerized applications at scale using Google Cloud's infrastructure. | ||
- [Cloud Run](https://cloud.google.com/run/docs) (in preview): Cloud Run is a serverless managed compute platform that enables you to run containers that are invocable via requests or events. | ||
|
||
We are curating a list of [notebook examples](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples) on how to programmaticaly train and deploy models on these Google Cloud services. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not only notebooks, and fixed a typo in programmatically
We are curating a list of [notebook examples](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples) on how to programmaticaly train and deploy models on these Google Cloud services. | |
We are curating a list of [examples](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples) on how to programmatically train and deploy models on these Google Cloud services. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't mix times, "curated"
--- | ||
|
||
Read more about both Vertex AI in [their official documentation](https://cloud.google.com/vertex-ai/docs) and GKE in [their official documentation](https://cloud.google.com/kubernetes-engine/docs). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why remove?
@@ -1,5 +1,11 @@ | |||
# Introduction | |||
|
|||
[Hugging Face Deep Learning Containers for Google Cloud](https://cloud.google.com/deep-learning-containers/docs/choosing-container#hugging-face) are a set of Docker images for training and deploying Transformers, Sentence Transformers, and Diffusers models on Google Cloud Vertex AI and Google Kubernetes Engine (GKE). | |||
Hugging Face built Deep Learning Containers (DLCs) for Google Cloud customers to run any of their machine learning workload in an optimized environment, with no configuration or maintenance on their part. These are Docker images pre-installed with deep learning frameworks and libraries such as 🤗 Transformers, 🤗 Datasets, and 🤗 Tokenizers. The DLCs allow you to directly serve and train any models, skipping the complicated process of building and optimizing your serving and training environments from scratch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI those are not frameworks we have libraries (transformers) and solutions (TGI)
@@ -1,5 +1,11 @@ | |||
# Introduction | |||
|
|||
[Hugging Face Deep Learning Containers for Google Cloud](https://cloud.google.com/deep-learning-containers/docs/choosing-container#hugging-face) are a set of Docker images for training and deploying Transformers, Sentence Transformers, and Diffusers models on Google Cloud Vertex AI and Google Kubernetes Engine (GKE). | |||
Hugging Face built Deep Learning Containers (DLCs) for Google Cloud customers to run any of their machine learning workload in an optimized environment, with no configuration or maintenance on their part. These are Docker images pre-installed with deep learning frameworks and libraries such as 🤗 Transformers, 🤗 Datasets, and 🤗 Tokenizers. The DLCs allow you to directly serve and train any models, skipping the complicated process of building and optimizing your serving and training environments from scratch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would keep the direct link, we can replace it with on our side if we have one. i would no use corporate "blurb" lets keep it direct and simple.
Face Deep Learning Containers for Google Cloud are optimized Docker containers for training and deploying Generative AI models including deep learning libraries like Transformers, Datasets, Tokenizers or Diffusers and and purpose built versions of Hugging Face Text Generation Inference (TGI) and Text Embedding Inference (TEI).
DLCs allow you to directly serve and train any models, skipping the complicated process of building and optimizing your serving and training environments from scratch.
- [Google Kubernetes Engine](https://cloud.google.com/kubernetes-engine/docs) (GKE): GKE is a fully-managed Kubernetes service in Google Cloud that can be used to deploy and operate containerized applications at scale using Google Cloud's infrastructure. | ||
- [Cloud Run](https://cloud.google.com/run/docs) (in preview): Cloud Run is a serverless managed compute platform that enables you to run containers that are invocable via requests or events. | ||
|
||
We are curating a list of [notebook examples](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples) on how to programmaticaly train and deploy models on these Google Cloud services. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't mix times, "curated"
#### On Hugging Face Inference Endpoints | ||
|
||
Hugging Face built Deep Learning Containers (DLCs) for Google Cloud customers to run any of their machine learning workload in an optimized environment, with no configuration or maintenance on their part. These are Docker images pre-installed with deep learning frameworks and libraries such as 🤗 Transformers, 🤗 Datasets, and 🤗 Tokenizers. The DLCs allow you to directly serve and train any models, skipping the complicated process of building and optimizing your serving and training environments from scratch. | ||
If you want to deploy a model from the hub but you don't have a Google Cloud environment, you can use Hugging Face [Inference Endpoints](https://huggingface.co/inference-endpoints/dedicated) on Google Cloud. Below, you will find step-by-step instructions on how to deploy [Gemma 2 9B](https://huggingface.co/google/gemma-2-9b-it): | ||
1. On the model page, open the “Deploy” menu, and select “Inference Endpoints (dedicated)”. This will now bring you in the Inference Endpoint deployment page. | ||
2. Select Google Cloud Platform, scroll down and click on "Create Endpoint". | ||
|
||
For training, our DLCs are available for PyTorch via 🤗 Transformers. They include support for training on both GPUs and TPUs with libraries such as 🤗 TRL, Sentence Transformers, or 🧨 Diffusers. | ||
Alternatively, you can follow this short video. | ||
<video src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/google-cloud/inference-endpoints.mp4" controls autoplay muted loop /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if we should add a Inference Endpoints section here. We should rather have that in the inference endpoints doc. We don't use any of the containers or solutions in IE.
Hello,
This is a suggestion to make the doc our single source of truth for all things GCP:
index
Getting Started
section to theDeep Learning Containers
section:features & benefits
index
@jeffboudier @alvarobartt @philschmid let me know what you think?
Link to simplify review: https://moon-ci-docs.huggingface.co/docs/google-cloud/pr_98/en/index