Skip to content

Commit

Permalink
support 12GB cards
Browse files Browse the repository at this point in the history
  • Loading branch information
ToTheBeginning committed Sep 19, 2024
1 parent 476c6a0 commit 0eae18d
Show file tree
Hide file tree
Showing 3 changed files with 14 additions and 2 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ We will actively update and maintain this repository in the near future, so plea
- [x] Local gradio demo is ready now
- [x] Online HuggingFace demo is ready now [![flux](https://img.shields.io/badge/🤗-PuLID_FLUX_demo-orange)](https://huggingface.co/spaces/yanze/PuLID-FLUX)
- [x] We have optimized the codes to support consumer-grade GPUS, and now **PuLID-FLUX can run on a 16GB graphic card**. Check the details [here](https://github.com/ToTheBeginning/PuLID/blob/main/docs/pulid_for_flux.md#local-gradio-demo)
- [x] Support 12GB graphic card


Below results are generated with PuLID-FLUX.
Expand Down
5 changes: 4 additions & 1 deletion docs/pulid_for_flux.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,10 @@ Run `python app_flux.py --offload --fp8 --onnx_provider cpu`, the peak memory is

For 24GB graphic memory users, you can run `python app_flux.py --offload --fp8`, the peak memory is under 17GB.

However, there is a difference in image quality between fp8 and bf16, with some degradation in the former.
For 12GB graphic memory users, you can run `python app_flux.py --aggressive_offload --fp8 --onnx_provider cpu`, the peak memory is about 11GB.
However, using aggressive offload (like sequential offload), the speed will be very slow due to the frequent need for memory transfers between CPU and GPU at each timestep.

Please note that, there is a difference in image quality between fp8 and bf16, with some degradation in the former.
Specifically, the details of the face may be slightly worse, but the layout is similar. If you want the best results
of PuLID-FLUX or you have the resources, please use bf16 rather than fp8.
We have included a comparison in the table below.
Expand Down
10 changes: 9 additions & 1 deletion flux/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -127,8 +127,16 @@ def forward(

img = torch.cat((txt, img), 1)
if aggressive_offload:
self.single_blocks = self.single_blocks.to(DEVICE)
# put half of the single blcoks to gpu
for i in range(len(self.single_blocks) // 2):
self.single_blocks[i] = self.single_blocks[i].to(DEVICE)
for i, block in enumerate(self.single_blocks):
if aggressive_offload and i == len(self.single_blocks)//2:
# put first half of the single blcoks to cpu and last half to gpu
for j in range(len(self.single_blocks) // 2):
self.single_blocks[j].cpu()
for j in range(len(self.single_blocks) // 2, len(self.single_blocks)):
self.single_blocks[j] = self.single_blocks[j].to(DEVICE)
x = block(img, vec=vec, pe=pe)
real_img, txt = x[:, txt.shape[1]:, ...], x[:, :txt.shape[1], ...]

Expand Down

0 comments on commit 0eae18d

Please sign in to comment.