Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quantized Models Chunking into unequal sizes #2320

Open
nighting0le01 opened this issue Aug 22, 2024 · 4 comments
Open

Quantized Models Chunking into unequal sizes #2320

nighting0le01 opened this issue Aug 22, 2024 · 4 comments
Labels
bug Unexpected behaviour that should be corrected (type)

Comments

@nighting0le01
Copy link

nighting0le01 commented Aug 22, 2024

🐞Describing the bug

with reference to this issue apple/ml-stable-diffusion#353, i used the bisect_model() function to split a quantized model into 2 chunks, i tried with 7.1 and 7.0 with reference to this file:https://github.com/apple/ml-stable-diffusion/blob/cf16df8207dfcba685a9391bad04f7402ea87b73/python_coreml_stable_diffusion/chunk_mlprogram.py#L123 , but was facing same issue.

 prog = _load_prog_from_mlmodel(model)

# Compute the incision point by bisecting the program based on weights size
op_idx, first_chunk_weights_size, total_weights_size = _get_op_idx_split_location(
    prog)
print(f"First  chunk size = {first_chunk_weights_size:.2f} MB") # 152.67 MB
print(f"Second chunk size = {total_weights_size - first_chunk_weights_size:.2f} MB") #0.42 MB
print(index=587/2720)
prog_chunk1 = _make_first_chunk_prog(f"index={op_idx}/{len(main_block.operations)") # 587/3000
prog_chunk2 = _make_second_chunk_prog(_load_prog_from_mlmodel(model), op_idx)

System environment (please complete the following information):

  • coremltools version:8.0b2

cc: @aseemw

@nighting0le01 nighting0le01 added the bug Unexpected behaviour that should be corrected (type) label Aug 22, 2024
@jakesabathia2
Copy link
Collaborator

jakesabathia2 commented Aug 22, 2024

@nighting0le01 would you mind providing a standalone script for us to reproduce?

@nighting0le01
Copy link
Author

nighting0le01 commented Aug 26, 2024

hi @jakesabathia2 !! here is the code to reproduce,
coremltools version 7.01, i know with 8.0b2 the chunking has moved to CoreMLtools but i think it has the same issue when chunking a quantized or palletized model

Model is simple MobileNet that can be downloaded from coremltools tutorial:https://apple.github.io/coremltools/docs-guides/source/opt-palettization-perf.html#:~:text=0.47-,MobileNetv2%2D1.0,-4%20bit

import coremltools as ct
from python_coreml_stable_diffusion.chunk_mlprogram import (
    _load_prog_from_mlmodel,
    _get_op_idx_split_location,
    _make_second_chunk_prog,
    _make_first_chunk_prog,
)
# link to get model:https://apple.github.io/coremltools/docs-guides/source/opt-palettization-perf.html#:~:text=0.47-,MobileNetv2%2D1.0,-4%20bit
model = ct.models.MLModel('MobileNetV2Alpha1ScalarPalettization4Bit.mlpackage')
prog = _load_prog_from_mlmodel(model)
# Load the MIL Program from MLModel
prog = _load_prog_from_mlmodel(model)

# Compute the incision point by bisecting the program based on weights size
op_idx, first_chunk_weights_size, total_weights_size = _get_op_idx_split_location(
    prog)
main_block = prog.functions["main"]
incision_op = main_block.operations[op_idx]

print(f"op_idx = {op_idx}")
print(f"First  chunk size = {first_chunk_weights_size:.2f} MB")
print(f"Second chunk size = {total_weights_size - first_chunk_weights_size:.2f} MB")
INFO:python_coreml_stable_diffusion.chunk_mlprogram:Loading MLModel object into a MIL Program object (including the weights)..
INFO:python_coreml_stable_diffusion.chunk_mlprogram:Program loaded in 0.1 seconds
INFO:python_coreml_stable_diffusion.chunk_mlprogram:Loading MLModel object into a MIL Program object (including the weights)..
INFO:python_coreml_stable_diffusion.chunk_mlprogram:Program loaded in 0.1 seconds
op_idx = 187
First  chunk size = 1.68 MB
Second chunk size = 0.15 MB

@nighting0le01
Copy link
Author

nighting0le01 commented Aug 26, 2024

Hi @jakesabathia2 , below is with 8.0b2 version of CoreMLtools,
cc @aseemw :apple/ml-stable-diffusion#353

import coremltools as ct
# link to get model:https://apple.github.io/coremltools/docs-guides/source/opt-palettization-perf.html#:~:text=0.47-,MobileNetv2%2D1.0,-4%20bit
model = ct.models.MLModel('MobileNetV2Alpha1ScalarPalettization4Bit.mlpackage')
prog = _load_prog_from_mlmodel(model)
# Load the MIL Program from MLModel
prog = _load_prog_from_mlmodel(model)
output_dir = "./output/"
model_path = './MobileNetV2Alpha1ScalarPalettization4Bit.mlpackage'
# Compute the incision point by bisecting the program based on weights size
ct.models.utils.bisect_model(
    model_path,
    output_dir,
    merge_chunks_to_pipeline=False,
)

print(f"First  chunk size = {first_chunk_weights_size:.2f} MB")
print(f"Second chunk size = {total_weights_size - first_chunk_weights_size:.2f} MB")

@nighting0le01
Copy link
Author

nighting0le01 commented Aug 28, 2024

@jakesabathia2 @DawerG @aseemw @atiorh @TobyRoseman any help appreciated thank you 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Unexpected behaviour that should be corrected (type)
Projects
None yet
Development

No branches or pull requests

2 participants