Add deployment options in the '/doc' #98

pagezyhf · 2024-09-24T15:46:41Z

Hello,

This is a suggestion to make the doc our single source of truth for all things GCP:

Added details on every way to deploy & train HF models on GCP in index
- with HF DLC
- from the Hub 'deploy on GCP'
- from the Hub 'inference endpoints'
- from the Model Garden
Added short tutorial video for each way to deploy models on GCP - I'm not happy with it and would like to improve the demo video (better animation zooming on clicks, actually deploying an endpoint deployment) so consider it placeholder.
Moved some info about HF DLC that where in the Getting Started section to the Deep Learning Containers section:
- features & benefits
- some stuff in index

@jeffboudier @alvarobartt @philschmid let me know what you think?
Link to simplify review: https://moon-ci-docs.huggingface.co/docs/google-cloud/pr_98/en/index

HuggingFaceDocBuilderDev · 2024-09-24T15:47:59Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

jeffboudier · 2024-09-24T23:02:00Z

docs/source/index.mdx

+- [GKE](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples/gke#inference-examples)
+- [Cloud Run](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples/cloud-run#inference-examples) (preview)
+
+### From the Hub Model Page


Suggested change

### From the Hub Model Page

### From the Hub

jeffboudier · 2024-09-24T23:02:44Z

docs/source/index.mdx


-For inference, we have a general-purpose PyTorch inference DLC, for serving models trained with any of those frameworks mentioned before on both CPU and GPU. There is also the Text Generation Inference (TGI) DLC for high-performance text generation of LLMs on both GPU and TPU. Finally, there is a Text Embeddings Inference (TEI) DLC for high-performance serving of embedding models on both CPU and GPU.
+### From Vertex AI Model Garden


I would move this section up, between DLCs and Hub

jeffboudier

Love it! For the quick screencast videos, I recommend the screen.studio app (presenter webcam overlay is optional, can be toggled off)

alvarobartt

Thank you @pagezyhf! 🤗

alvarobartt · 2024-09-25T07:04:36Z

docs/source/containers/introduction.mdx

@@ -1,5 +1,11 @@
 # Introduction

-[Hugging Face Deep Learning Containers for Google Cloud](https://cloud.google.com/deep-learning-containers/docs/choosing-container#hugging-face) are a set of Docker images for training and deploying Transformers, Sentence Transformers, and Diffusers models on Google Cloud Vertex AI and Google Kubernetes Engine (GKE).
+Hugging Face built Deep Learning Containers (DLCs) for Google Cloud customers to run any of their machine learning workload in an optimized environment, with no configuration or maintenance on their part. These are Docker images pre-installed with deep learning frameworks and libraries such as 🤗 Transformers, 🤗 Datasets, and 🤗 Tokenizers. The DLCs allow you to directly serve and train any models, skipping the complicated process of building and optimizing your serving and training environments from scratch.


Here we should mention TGI and TEI too right? We can phrase it as the following (but with better wording)

"DLCs are Docker images pre-installed with deep learning solutions such as TGI and TEI for inference; or frameworks as Transformers for both training and inference."

We don't use "🤗 Transformers" emojis anymore.

FYI those are not frameworks we have libraries (transformers) and solutions (TGI)

alvarobartt · 2024-09-25T07:05:09Z

docs/source/containers/introduction.mdx

+- [Google Kubernetes Engine](https://cloud.google.com/kubernetes-engine/docs) (GKE): GKE is a fully-managed Kubernetes service in Google Cloud that can be used to deploy and operate containerized applications at scale using Google Cloud's infrastructure. 
+- [Cloud Run](https://cloud.google.com/run/docs) (in preview): Cloud Run is a serverless managed compute platform that enables you to run containers that are invocable via requests or events.
+
+We are curating a list of [notebook examples](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples) on how to programmaticaly train and deploy models on these Google Cloud services.


Not only notebooks, and fixed a typo in programmatically

Suggested change

We are curating a list of [notebook examples](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples) on how to programmaticaly train and deploy models on these Google Cloud services.

We are curating a list of [examples](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples) on how to programmatically train and deploy models on these Google Cloud services.

Don't mix times, "curated"

philschmid · 2024-09-25T09:24:10Z

docs/source/containers/features.mdx

---
-
-Read more about both Vertex AI in [their official documentation](https://cloud.google.com/vertex-ai/docs) and GKE in [their official documentation](https://cloud.google.com/kubernetes-engine/docs).


why remove?

philschmid · 2024-09-25T09:29:02Z

docs/source/containers/introduction.mdx

@@ -1,5 +1,11 @@
 # Introduction

-[Hugging Face Deep Learning Containers for Google Cloud](https://cloud.google.com/deep-learning-containers/docs/choosing-container#hugging-face) are a set of Docker images for training and deploying Transformers, Sentence Transformers, and Diffusers models on Google Cloud Vertex AI and Google Kubernetes Engine (GKE).
+Hugging Face built Deep Learning Containers (DLCs) for Google Cloud customers to run any of their machine learning workload in an optimized environment, with no configuration or maintenance on their part. These are Docker images pre-installed with deep learning frameworks and libraries such as 🤗 Transformers, 🤗 Datasets, and 🤗 Tokenizers. The DLCs allow you to directly serve and train any models, skipping the complicated process of building and optimizing your serving and training environments from scratch.


FYI those are not frameworks we have libraries (transformers) and solutions (TGI)

philschmid · 2024-09-25T09:31:50Z

docs/source/containers/introduction.mdx

@@ -1,5 +1,11 @@
 # Introduction

-[Hugging Face Deep Learning Containers for Google Cloud](https://cloud.google.com/deep-learning-containers/docs/choosing-container#hugging-face) are a set of Docker images for training and deploying Transformers, Sentence Transformers, and Diffusers models on Google Cloud Vertex AI and Google Kubernetes Engine (GKE).
+Hugging Face built Deep Learning Containers (DLCs) for Google Cloud customers to run any of their machine learning workload in an optimized environment, with no configuration or maintenance on their part. These are Docker images pre-installed with deep learning frameworks and libraries such as 🤗 Transformers, 🤗 Datasets, and 🤗 Tokenizers. The DLCs allow you to directly serve and train any models, skipping the complicated process of building and optimizing your serving and training environments from scratch.


I would keep the direct link, we can replace it with on our side if we have one. i would no use corporate "blurb" lets keep it direct and simple.

Face Deep Learning Containers for Google Cloud are optimized Docker containers for training and deploying Generative AI models including deep learning libraries like Transformers, Datasets, Tokenizers or Diffusers and and purpose built versions of Hugging Face Text Generation Inference (TGI) and Text Embedding Inference (TEI).
DLCs allow you to directly serve and train any models, skipping the complicated process of building and optimizing your serving and training environments from scratch.

philschmid · 2024-09-25T09:32:27Z

docs/source/containers/introduction.mdx

+- [Google Kubernetes Engine](https://cloud.google.com/kubernetes-engine/docs) (GKE): GKE is a fully-managed Kubernetes service in Google Cloud that can be used to deploy and operate containerized applications at scale using Google Cloud's infrastructure. 
+- [Cloud Run](https://cloud.google.com/run/docs) (in preview): Cloud Run is a serverless managed compute platform that enables you to run containers that are invocable via requests or events.
+
+We are curating a list of [notebook examples](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples) on how to programmaticaly train and deploy models on these Google Cloud services.


Don't mix times, "curated"

philschmid · 2024-09-25T09:33:51Z

docs/source/index.mdx

+#### On Hugging Face Inference Endpoints

-Hugging Face built Deep Learning Containers (DLCs) for Google Cloud customers to run any of their machine learning workload in an optimized environment, with no configuration or maintenance on their part. These are Docker images pre-installed with deep learning frameworks and libraries such as 🤗 Transformers, 🤗 Datasets, and 🤗 Tokenizers. The DLCs allow you to directly serve and train any models, skipping the complicated process of building and optimizing your serving and training environments from scratch.
+If you want to deploy a model from the hub but you don't have a Google Cloud environment, you can use Hugging Face [Inference Endpoints](https://huggingface.co/inference-endpoints/dedicated) on Google Cloud. Below, you will find step-by-step instructions on how to deploy [Gemma 2 9B](https://huggingface.co/google/gemma-2-9b-it):
+1. On the model page, open the “Deploy” menu, and select “Inference Endpoints (dedicated)”. This will now bring you in the Inference Endpoint deployment page.
+2. Select Google Cloud Platform, scroll down and click on "Create Endpoint". 

-For training, our DLCs are available for PyTorch via 🤗 Transformers. They include support for training on both GPUs and TPUs with libraries such as 🤗 TRL, Sentence Transformers, or 🧨 Diffusers.
+Alternatively, you can follow this short video.
+<video src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/google-cloud/inference-endpoints.mp4" controls autoplay muted loop />


Not sure if we should add a Inference Endpoints section here. We should rather have that in the inference endpoints doc. We don't use any of the containers or solutions in IE.

pagezyhf added 4 commits September 24, 2024 14:25

move features and benefits under HF DLC section

563fd1a

move to container folder

8a48ac7

Support section

5ec28b2

add instructions for each deployment style

37233c8

pagezyhf added 7 commits September 24, 2024 17:52

links

85ab65a

broken link

abedfe4

video settings

df2c9d7

change indent

eed7048

video settings

7bf7123

broken lings

af92541

small typos

5fbf47b

jeffboudier self-requested a review September 24, 2024 23:01

jeffboudier reviewed Sep 24, 2024

View reviewed changes

alvarobartt approved these changes Sep 25, 2024

View reviewed changes

philschmid reviewed Sep 25, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add deployment options in the '/doc' #98

Add deployment options in the '/doc' #98

pagezyhf commented Sep 24, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Sep 24, 2024

jeffboudier Sep 24, 2024

jeffboudier Sep 24, 2024

jeffboudier left a comment

alvarobartt left a comment

alvarobartt Sep 25, 2024 •

edited

Loading

philschmid Sep 25, 2024

philschmid Sep 25, 2024

alvarobartt Sep 25, 2024

philschmid Sep 25, 2024

philschmid Sep 25, 2024

philschmid Sep 25, 2024

philschmid Sep 25, 2024

philschmid Sep 25, 2024

philschmid Sep 25, 2024


		For inference, we have a general-purpose PyTorch inference DLC, for serving models trained with any of those frameworks mentioned before on both CPU and GPU. There is also the Text Generation Inference (TGI) DLC for high-performance text generation of LLMs on both GPU and TPU. Finally, there is a Text Embeddings Inference (TEI) DLC for high-performance serving of embedding models on both CPU and GPU.
		### From Vertex AI Model Garden

	We are curating a list of [notebook examples](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples) on how to programmaticaly train and deploy models on these Google Cloud services.
	We are curating a list of [examples](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples) on how to programmatically train and deploy models on these Google Cloud services.

		---

		Read more about both Vertex AI in [their official documentation](https://cloud.google.com/vertex-ai/docs) and GKE in [their official documentation](https://cloud.google.com/kubernetes-engine/docs).

Add deployment options in the '/doc' #98

Are you sure you want to change the base?

Add deployment options in the '/doc' #98

Conversation

pagezyhf commented Sep 24, 2024 • edited Loading

HuggingFaceDocBuilderDev commented Sep 24, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeffboudier left a comment

Choose a reason for hiding this comment

alvarobartt left a comment

Choose a reason for hiding this comment

alvarobartt Sep 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pagezyhf commented Sep 24, 2024 •

edited

Loading

alvarobartt Sep 25, 2024 •

edited

Loading