Skip to content

Commit bd407b2

Browse files
pagezyhfjulien-cjeffboudier
authored
Deprecate IaaS and TaaS Nvidia experience (#2803)
* add note and alternative * Update train-dgx-cloud.md Co-authored-by: Julien Chaumond <[email protected]> * Update train-dgx-cloud.md Co-authored-by: Jeff Boudier <[email protected]> * Update inference-dgx-cloud.md Co-authored-by: Jeff Boudier <[email protected]> --------- Co-authored-by: Julien Chaumond <[email protected]> Co-authored-by: Jeff Boudier <[email protected]>
1 parent dff757a commit bd407b2

File tree

2 files changed

+11
-7
lines changed

2 files changed

+11
-7
lines changed

inference-dgx-cloud.md

+9-7
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,8 @@ authors:
88

99
# Serverless Inference with Hugging Face and NVIDIA NIM
1010

11+
> **Update:** This service is deprecated and no longer available as of April 10th, 2025. For an alternative, you should consider [Inference Providers](https://huggingface.co/docs/inference-providers/en/index)
12+
1113
Today, we are thrilled to announce the launch of **Hugging Face** **NVIDIA NIM API (serverless)**, a new service on the Hugging Face Hub, available to Enterprise Hub organizations. This new service makes it easy to use open models with the accelerated compute platform, of [NVIDIA DGX Cloud](https://www.nvidia.com/en-us/data-center/dgx-cloud) accelerated compute platform for inference serving. We built this solution so that Enterprise Hub users can easily access the latest NVIDIA AI technology in a serverless way to run inference on popular Generative AI models including Llama and Mistral, using standardized APIs and a few lines of code within the[ Hugging Face Hub](https://huggingface.co/models).
1214

1315

@@ -25,7 +27,7 @@ NVIDIA NIM API (serverless) complements [Train on DGX Cloud](https://huggingface
2527

2628
## How it works
2729

28-
Running serverless inference with Hugging Face models has never been easier. Heres a step-by-step guide to get you started:
30+
Running serverless inference with Hugging Face models has never been easier. Here's a step-by-step guide to get you started:
2931

3032
_Note: You need access to an Organization with a [Hugging Face Enterprise Hub](https://huggingface.co/enterprise) subscription to run Inference._
3133

@@ -36,15 +38,15 @@ Before you begin, ensure you meet the following requirements:
3638

3739
### Create a Fine-Grained Token
3840

39-
Fine-grained tokens allow users to create tokens with specific permissions for precise access control to resources and namespaces. First, go to[ Hugging Face Access Tokens](https://huggingface.co/settings/tokens) and click on Create new Token and select fine-grained.
41+
Fine-grained tokens allow users to create tokens with specific permissions for precise access control to resources and namespaces. First, go to[ Hugging Face Access Tokens](https://huggingface.co/settings/tokens) and click on "Create new Token" and select "fine-grained".
4042

4143
<div align="center">
4244
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/inference-dgx-cloud/fine-grained-token-1.png" alt="Create Token">
4345
</div>
4446

4547

4648

47-
Enter a Token name and select your Enterprise organization in org permissions as scope and then click Create token. You dont need to select any additional scopes.
49+
Enter a "Token name" and select your Enterprise organization in "org permissions" as scope and then click "Create token". You don't need to select any additional scopes.
4850

4951

5052
<div align="center">
@@ -57,9 +59,9 @@ Now, make sure to save this token value to authenticate your requests later.
5759

5860
### **Find your NIM**
5961

60-
You can find NVIDIA NIM API (serverless) on the model page of supported Generative AI models. You can find all supported models in this [NVIDIA NIM Collection](https://huggingface.co/collections/nvidia/nim-66a3c6fcdcb5bbc6e975b508), and in the Pricing section.
62+
You can find "NVIDIA NIM API (serverless)" on the model page of supported Generative AI models. You can find all supported models in this [NVIDIA NIM Collection](https://huggingface.co/collections/nvidia/nim-66a3c6fcdcb5bbc6e975b508), and in the Pricing section.
6163

62-
We will use the `meta-llama/Meta-Llama-3-8B-Instruct`. Go the [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) model card open Deploy menu, and select NVIDIA NIM API (serverless) - this will open an interface with pre-generated code snippets for Python, Javascript or Curl.
64+
We will use the `meta-llama/Meta-Llama-3-8B-Instruct`. Go the [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) model card open "Deploy" menu, and select "NVIDIA NIM API (serverless)" - this will open an interface with pre-generated code snippets for Python, Javascript or Curl.
6365

6466

6567

@@ -70,7 +72,7 @@ We will use the `meta-llama/Meta-Llama-3-8B-Instruct`. Go the [meta-llama/Meta-L
7072

7173
### **Send your requests**
7274

73-
NVIDIA NIM API (serverless) is standardized on the OpenAI API. This allows you to use the `openai` sdk for inference. Replace the `YOUR_FINE_GRAINED_TOKEN_HERE` with your fine-grained token and you are ready to run inference.
75+
NVIDIA NIM API (serverless) is standardized on the OpenAI API. This allows you to use the `openai'` sdk for inference. Replace the `YOUR_FINE_GRAINED_TOKEN_HERE` with your fine-grained token and you are ready to run inference.
7476

7577
```python
7678
from openai import OpenAI
@@ -159,7 +161,7 @@ The total cost for a request will depend on the model size, the number of GPUs r
159161
</table>
160162

161163

162-
Usage fees accrue to your Enterprise Hub Organizations current monthly billing cycle. You can check your current and past usage at any time within the billing settings of your Enterprise Hub Organization.
164+
Usage fees accrue to your Enterprise Hub Organizations' current monthly billing cycle. You can check your current and past usage at any time within the billing settings of your Enterprise Hub Organization.
163165

164166
**Supported Models**
165167

train-dgx-cloud.md

+2
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,8 @@ authors:
1111

1212
# Easily Train Models with H100 GPUs on NVIDIA DGX Cloud
1313

14+
> **Update:** This service is deprecated and no longer available as of April 10th, 2025.
15+
1416
Today, we are thrilled to announce the launch of **Train on DGX Cloud**, a new service on the Hugging Face Hub, available to Enterprise Hub organizations. Train on DGX Cloud makes it easy to use open models with the accelerated compute infrastructure of NVIDIA DGX Cloud. Together, we built Train on DGX Cloud so that Enterprise Hub users can easily access the latest NVIDIA H100 Tensor Core GPUs, to fine-tune popular Generative AI models like Llama, Mistral, and Stable Diffusion, in just a few clicks within the [Hugging Face Hub](https://huggingface.co/models).
1517

1618
<div align="center">

0 commit comments

Comments
 (0)