Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tutorial / Example for Single Node FP8 Inference? #216

Open
noamgat opened this issue Jun 23, 2024 · 0 comments
Open

Tutorial / Example for Single Node FP8 Inference? #216

noamgat opened this issue Jun 23, 2024 · 0 comments

Comments

@noamgat
Copy link

noamgat commented Jun 23, 2024

Hello,

The web page for Nemotron-4-340B states:
https://research.nvidia.com/publication/2024-06_nemotron-4-340b

These models perform competitively to open access models on a wide range of evaluation benchmarks, and were sized to fit on a single DGX H100 with 8 GPUs when deployed in FP8 precision.

However, I wasn't able to deploy this. Is there a guide for weight conversion / sample configuration, for example to serve the reward model on a single 8x80GB node using FP8 precision?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant