Tutorial / Example for Single Node FP8 Inference? #216

noamgat · 2024-06-23T08:35:06Z

Hello,

The web page for Nemotron-4-340B states:
https://research.nvidia.com/publication/2024-06_nemotron-4-340b

These models perform competitively to open access models on a wide range of evaluation benchmarks, and were sized to fit on a single DGX H100 with 8 GPUs when deployed in FP8 precision.

However, I wasn't able to deploy this. Is there a guide for weight conversion / sample configuration, for example to serve the reward model on a single 8x80GB node using FP8 precision?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tutorial / Example for Single Node FP8 Inference? #216

Tutorial / Example for Single Node FP8 Inference? #216

noamgat commented Jun 23, 2024 •

edited

Loading

Tutorial / Example for Single Node FP8 Inference? #216

Tutorial / Example for Single Node FP8 Inference? #216

Comments

noamgat commented Jun 23, 2024 • edited Loading

noamgat commented Jun 23, 2024 •

edited

Loading