You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: inference-dgx-cloud.md
+9-7
Original file line number
Diff line number
Diff line change
@@ -8,6 +8,8 @@ authors:
8
8
9
9
# Serverless Inference with Hugging Face and NVIDIA NIM
10
10
11
+
> **Update:** This service is deprecated and no longer available as of April 10th, 2025. For an alternative, you should consider [Inference Providers](https://huggingface.co/docs/inference-providers/en/index)
12
+
11
13
Today, we are thrilled to announce the launch of **Hugging Face****NVIDIA NIM API (serverless)**, a new service on the Hugging Face Hub, available to Enterprise Hub organizations. This new service makes it easy to use open models with the accelerated compute platform, of [NVIDIA DGX Cloud](https://www.nvidia.com/en-us/data-center/dgx-cloud) accelerated compute platform for inference serving. We built this solution so that Enterprise Hub users can easily access the latest NVIDIA AI technology in a serverless way to run inference on popular Generative AI models including Llama and Mistral, using standardized APIs and a few lines of code within the[ Hugging Face Hub](https://huggingface.co/models).
12
14
13
15
@@ -25,7 +27,7 @@ NVIDIA NIM API (serverless) complements [Train on DGX Cloud](https://huggingface
25
27
26
28
## How it works
27
29
28
-
Running serverless inference with Hugging Face models has never been easier. Here’s a step-by-step guide to get you started:
30
+
Running serverless inference with Hugging Face models has never been easier. Here's a step-by-step guide to get you started:
29
31
30
32
_Note: You need access to an Organization with a [Hugging Face Enterprise Hub](https://huggingface.co/enterprise) subscription to run Inference._
31
33
@@ -36,15 +38,15 @@ Before you begin, ensure you meet the following requirements:
36
38
37
39
### Create a Fine-Grained Token
38
40
39
-
Fine-grained tokens allow users to create tokens with specific permissions for precise access control to resources and namespaces. First, go to[ Hugging Face Access Tokens](https://huggingface.co/settings/tokens) and click on “Create new Token” and select “fine-grained”.
41
+
Fine-grained tokens allow users to create tokens with specific permissions for precise access control to resources and namespaces. First, go to[ Hugging Face Access Tokens](https://huggingface.co/settings/tokens) and click on "Create new Token" and select "fine-grained".
Enter a “Token name” and select your Enterprise organization in “org permissions” as scope and then click “Create token”. You don’t need to select any additional scopes.
49
+
Enter a "Token name" and select your Enterprise organization in "org permissions" as scope and then click "Create token". You don't need to select any additional scopes.
48
50
49
51
50
52
<divalign="center">
@@ -57,9 +59,9 @@ Now, make sure to save this token value to authenticate your requests later.
57
59
58
60
### **Find your NIM**
59
61
60
-
You can find “NVIDIA NIM API (serverless)” on the model page of supported Generative AI models. You can find all supported models in this [NVIDIA NIM Collection](https://huggingface.co/collections/nvidia/nim-66a3c6fcdcb5bbc6e975b508), and in the Pricing section.
62
+
You can find "NVIDIA NIM API (serverless)" on the model page of supported Generative AI models. You can find all supported models in this [NVIDIA NIM Collection](https://huggingface.co/collections/nvidia/nim-66a3c6fcdcb5bbc6e975b508), and in the Pricing section.
61
63
62
-
We will use the `meta-llama/Meta-Llama-3-8B-Instruct`. Go the [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) model card open “Deploy” menu, and select “NVIDIA NIM API (serverless)” - this will open an interface with pre-generated code snippets for Python, Javascript or Curl.
64
+
We will use the `meta-llama/Meta-Llama-3-8B-Instruct`. Go the [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) model card open "Deploy" menu, and select "NVIDIA NIM API (serverless)" - this will open an interface with pre-generated code snippets for Python, Javascript or Curl.
63
65
64
66
65
67
@@ -70,7 +72,7 @@ We will use the `meta-llama/Meta-Llama-3-8B-Instruct`. Go the [meta-llama/Meta-L
70
72
71
73
### **Send your requests**
72
74
73
-
NVIDIA NIM API (serverless) is standardized on the OpenAI API. This allows you to use the `openai’` sdk for inference. Replace the `YOUR_FINE_GRAINED_TOKEN_HERE` with your fine-grained token and you are ready to run inference.
75
+
NVIDIA NIM API (serverless) is standardized on the OpenAI API. This allows you to use the `openai'` sdk for inference. Replace the `YOUR_FINE_GRAINED_TOKEN_HERE` with your fine-grained token and you are ready to run inference.
74
76
75
77
```python
76
78
from openai import OpenAI
@@ -159,7 +161,7 @@ The total cost for a request will depend on the model size, the number of GPUs r
159
161
</table>
160
162
161
163
162
-
Usage fees accrue to your Enterprise Hub Organizations’ current monthly billing cycle. You can check your current and past usage at any time within the billing settings of your Enterprise Hub Organization.
164
+
Usage fees accrue to your Enterprise Hub Organizations' current monthly billing cycle. You can check your current and past usage at any time within the billing settings of your Enterprise Hub Organization.
Copy file name to clipboardExpand all lines: train-dgx-cloud.md
+2
Original file line number
Diff line number
Diff line change
@@ -11,6 +11,8 @@ authors:
11
11
12
12
# Easily Train Models with H100 GPUs on NVIDIA DGX Cloud
13
13
14
+
> **Update:** This service is deprecated and no longer available as of April 10th, 2025.
15
+
14
16
Today, we are thrilled to announce the launch of **Train on DGX Cloud**, a new service on the Hugging Face Hub, available to Enterprise Hub organizations. Train on DGX Cloud makes it easy to use open models with the accelerated compute infrastructure of NVIDIA DGX Cloud. Together, we built Train on DGX Cloud so that Enterprise Hub users can easily access the latest NVIDIA H100 Tensor Core GPUs, to fine-tune popular Generative AI models like Llama, Mistral, and Stable Diffusion, in just a few clicks within the [Hugging Face Hub](https://huggingface.co/models).
0 commit comments