Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace LayerCompressor with HooksMixin #1038

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

kylesayrs
Copy link
Collaborator

@kylesayrs kylesayrs commented Jan 6, 2025

Purpose

  • Remove layer compressor to decouple modifiers from data pipelines
  • Reduce abstractions
  • Support VLMs with SparseGPT and Wanda

Prerequisites

Changes

Interface/ Features

  • SparseGPT and Wanda now both support VLM architectures
  • Added sequential_targets to match GPTQ and made targets an alias
  • Support hessian offloading for SparseGPT
  • Add customized _LinAlgError for SparseGPT

Implementations

  • Changed implementation styles of SparseGPTModifier and WandaPruningModifier to match GPTQModifier
  • Removed LayerCompressor, ModuleCompressionWrapper, SparseGptWrapper, and WandaWrapper
  • Shared implementations between SparseGPT and Wanda are implemented by the SparsityModifierMixin
  • Removed lines blocking allow_tf32
    • Maybe @rahul-tuli knows why this was originally implemented, potentially to avoid hardware issues?
    • This change was only present for wanda. Given that all other modifiers do not have this change, I see no reason why it should stay
  • Updated sparsegpt tests to reflect new implementation

Tests

  • Updated obcq tests to reflect new implementations
  • Removed test_sgpt_defaults.py since this test doesn't test anything new or novel about this modifier

Testing

  • grep -r "LayerCompressor\|ModuleCompressionWrapper\|SparseGptWrapper\|WandaWrapper" src/ examples/ tests/
  • Modified test_invalid_layerwise_recipes_raise_exceptions and test_successful_layerwise_recipe pass
  • llama3_8b_2of4.py passes and was evaluated with both SparseGPT and Wanda

Potential Follow ups

  • Add module targets and ignore to SparseGPT and Wanda

Regression Testing

The hessian, row scalar, and compressed weight values were confirmed to be unchanged in the case that of one calibration sample. The final evaluations are different, which is likely due to numerical imprecision (dividing by int vs torch.int), different pipelines (different subgraph partitions => different imprecision from cpu offloading, potentially different module arguments).

Evaluation

Models were compressed using examples/sparse_2of4_quantization_fp8/llama3_8b_2of4.py

sparsegpt

Main

hf (pretrained=/home/ksayers/llm-compressor/old_Llama-3.2-1B-Instruct2of4-sparse,dtype=bfloat16,add_bos_token=True), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 1                                                           
|  Tasks   |Version|Filter|n-shot|Metric|   |Value |   |Stderr|                                                        
|----------|------:|------|-----:|------|---|-----:|---|-----:|                                                        
|winogrande|      1|none  |     5|acc   |?  |0.5391|?  | 0.014|

Branch

hf (pretrained=/home/ksayers/llm-compressor/new_Llama-3.2-1B-Instruct2of4-sparse,dtype=bfloat16,add_bos_token=True), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 1
|  Tasks   |Version|Filter|n-shot|Metric|   |Value|   |Stderr|
|----------|------:|------|-----:|------|---|----:|---|-----:|
|winogrande|      1|none  |     5|acc   |?  |0.547|?  | 0.014|

To test wanda, the SparseGPTModifier was replaced with the WandaPruningModifier

wanda

Main

hf (pretrained=/home/kyle/old_llm-compressor/Llama-3.2-1B-Instruct2of4-sparse,dtype=bfloat16,add_bos_token=True), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 1
|  Tasks   |Version|Filter|n-shot|Metric|   |Value|   |Stderr|
|----------|------:|------|-----:|------|---|----:|---|-----:|
|winogrande|      1|none  |     5|acc   |↑  |0.532|±  | 0.014|

Branch

hf (pretrained=/home/kyle/llm-compressor/Llama-3.2-1B-Instruct2of4-sparse,dtype=bfloat16,add_bos_token=True), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 1
|  Tasks   |Version|Filter|n-shot|Metric|   |Value |   |Stderr|
|----------|------:|------|-----:|------|---|-----:|---|-----:|
|winogrande|      1|none  |     5|acc   |↑  |0.5414|±  | 0.014|

Copy link

github-actions bot commented Jan 6, 2025

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

@kylesayrs kylesayrs changed the base branch from main to kylesayrs/gptq-partition January 6, 2025 22:24
Base automatically changed from kylesayrs/gptq-partition to main January 8, 2025 22:15
@kylesayrs kylesayrs force-pushed the kylesayrs/remove-layer-compressor branch from d8c3261 to 08d700c Compare January 13, 2025 21:41
@kylesayrs kylesayrs marked this pull request as ready for review January 13, 2025 23:10
@kylesayrs kylesayrs marked this pull request as draft January 13, 2025 23:15
@kylesayrs kylesayrs marked this pull request as ready for review January 14, 2025 04:31
@kylesayrs kylesayrs self-assigned this Jan 14, 2025
Signed-off-by: Kyle Sayers <[email protected]>
@kylesayrs kylesayrs force-pushed the kylesayrs/remove-layer-compressor branch from 71067ad to 59bdb66 Compare January 23, 2025 17:56
@kylesayrs kylesayrs added the ready When a PR is ready for review label Jan 23, 2025
@kylesayrs kylesayrs marked this pull request as draft January 26, 2025 16:54
@kylesayrs kylesayrs removed the ready When a PR is ready for review label Jan 26, 2025
@kylesayrs kylesayrs added the ready When a PR is ready for review label Jan 27, 2025
@kylesayrs
Copy link
Collaborator Author

kylesayrs commented Jan 29, 2025

Something super weird is happening with that failing test

tests/llmcompressor/transformers/sparsification/test_compress_tensor_utils.py::test_correct_compressor_inferred[W8A16-int-False-pack-quantized-dense] - AssertionError: assert 'marlin-24' == 'pack-quantized'

I've observed two things

  1. When running all the tests in the file, the test_correct_compressor_inferred consistently fails with these parameters, but only on this branch. This branch seemly does not touch any of the code related to this test
  2. When selecting only this test with -k test_correct_compressor_inferred, the test successfully passes 90% of the time, but 10% of the time fails with the warning assert (Sparse24BitMaskConfig(registry_requires_subclass=False, format='s parse-24-bitmask', targets=['...

In the meantime, this PR is reviewable

@kylesayrs kylesayrs removed the ready When a PR is ready for review label Jan 29, 2025
@kylesayrs kylesayrs marked this pull request as ready for review January 29, 2025 04:06
@kylesayrs kylesayrs added the ready When a PR is ready for review label Jan 31, 2025
Signed-off-by: Kyle Sayers <[email protected]>
@kylesayrs
Copy link
Collaborator Author

Looks like adding this change fixed the test

weights = torch.rand(10, 4)
    if is_24:
        weights = _make_24_sparse(weights)
    else:
        weights[0, :] = torch.ones(4, )  # guarantee not 24 sparse

The most likely explanation is that this test randomly fails, and that this PR happened to be unlucky

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready When a PR is ready for review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant