Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EPIC] Resource requirements and cost of foundation models #36

Open
1 of 5 tasks
Shreyanand opened this issue Apr 13, 2023 · 3 comments
Open
1 of 5 tasks

[EPIC] Resource requirements and cost of foundation models #36

Shreyanand opened this issue Apr 13, 2023 · 3 comments
Assignees
Labels
enhancement New feature or request

Comments

@Shreyanand
Copy link
Member

Shreyanand commented Apr 13, 2023

The large size of foundation models raise several resource and cost questions around deploying them in production. This EPIC will focus on creating experiments and showing results around some of the following questions:

  • What is the relationship between model parameters and its memory consumption? Create a "rosetta stone" document of GPU memory required by models belonging to different parameter size. Create a notebook that captures the footprint of the GPU memory used.
  • How are the models loaded into GPU RAM? Is it directly from S3 or do we also require significant RAM? If so, capture the RAM requirements in a notebook. What about CPU? Update the cost document with RAM, and CPU information. Are there ways to optimize this?
  • What happens when we load the models in a lower precision format like INT-8? How is the accuracy, CPU, and memory performance affected? Explain theoretically and show results in a notebook. Touch upon challenges of frameworks like bitsandbytes in production.
  • Is distributed training and inference with a lot of cheap instances more efficient per dollar than 1 instance with a large GPU? If we have just one GPU of 16GB memory, how much can be done with it in the space of LLMs? Design experiments and share results in a notebook.
  • What are the options of running these models just on CPU? Are there ways of optimizing more than 1B = 1GB of GPU with INT8 precision?
@Shreyanand Shreyanand added the enhancement New feature or request label Apr 13, 2023
@codificat
Copy link
Member

About what happens when lowering precision, here's an interesting blog post: LLM.int8() and Emergent Features

@suppathak
Copy link
Collaborator

suppathak commented Apr 14, 2023

@Shreyanand
Copy link
Member Author

@suppathak These experiments (1, 2) are directly related to 1 GPU task that you're doing. You should adapt these for our context.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants