Replace LayerCompressor with HooksMixin #1038

kylesayrs · 2025-01-06T22:24:05Z

Purpose

Remove layer compressor to decouple modifiers from data pipelines
Reduce abstractions
Support VLMs with SparseGPT and Wanda

Prerequisites

Changes

Interface/ Features

SparseGPT and Wanda now both support VLM architectures
Added sequential_targets to match GPTQ and made targets an alias
Support hessian offloading for SparseGPT
Add customized _LinAlgError for SparseGPT

Implementations

Changed implementation styles of SparseGPTModifier and WandaPruningModifier to match GPTQModifier
Removed LayerCompressor, ModuleCompressionWrapper, SparseGptWrapper, and WandaWrapper
Shared implementations between SparseGPT and Wanda are implemented by the SparsityModifierMixin
Removed lines blocking allow_tf32
- Maybe @rahul-tuli knows why this was originally implemented, potentially to avoid hardware issues?
- This change was only present for wanda. Given that all other modifiers do not have this change, I see no reason why it should stay
Updated sparsegpt tests to reflect new implementation

Tests

Updated obcq tests to reflect new implementations
Removed test_sgpt_defaults.py since this test doesn't test anything new or novel about this modifier

Testing

grep -r "LayerCompressor\|ModuleCompressionWrapper\|SparseGptWrapper\|WandaWrapper" src/ examples/ tests/
Modified test_invalid_layerwise_recipes_raise_exceptions and test_successful_layerwise_recipe pass
llama3_8b_2of4.py passes and was evaluated with both SparseGPT and Wanda

Potential Follow ups

Add module targets and ignore to SparseGPT and Wanda

Regression Testing

The hessian, row scalar, and compressed weight values were confirmed to be unchanged in the case that of one calibration sample. The final evaluations are different, which is likely due to numerical imprecision (dividing by int vs torch.int), different pipelines (different subgraph partitions => different imprecision from cpu offloading, potentially different module arguments).

Evaluation

Models were compressed using examples/sparse_2of4_quantization_fp8/llama3_8b_2of4.py

sparsegpt

Main

hf (pretrained=/home/ksayers/llm-compressor/old_Llama-3.2-1B-Instruct2of4-sparse,dtype=bfloat16,add_bos_token=True), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 1                                                           
|  Tasks   |Version|Filter|n-shot|Metric|   |Value |   |Stderr|                                                        
|----------|------:|------|-----:|------|---|-----:|---|-----:|                                                        
|winogrande|      1|none  |     5|acc   |?  |0.5391|?  | 0.014|

Branch

hf (pretrained=/home/ksayers/llm-compressor/new_Llama-3.2-1B-Instruct2of4-sparse,dtype=bfloat16,add_bos_token=True), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 1
|  Tasks   |Version|Filter|n-shot|Metric|   |Value|   |Stderr|
|----------|------:|------|-----:|------|---|----:|---|-----:|
|winogrande|      1|none  |     5|acc   |?  |0.547|?  | 0.014|

To test wanda, the SparseGPTModifier was replaced with the WandaPruningModifier

wanda

Main

hf (pretrained=/home/kyle/old_llm-compressor/Llama-3.2-1B-Instruct2of4-sparse,dtype=bfloat16,add_bos_token=True), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 1
|  Tasks   |Version|Filter|n-shot|Metric|   |Value|   |Stderr|
|----------|------:|------|-----:|------|---|----:|---|-----:|
|winogrande|      1|none  |     5|acc   |↑  |0.532|±  | 0.014|

Branch

hf (pretrained=/home/kyle/llm-compressor/Llama-3.2-1B-Instruct2of4-sparse,dtype=bfloat16,add_bos_token=True), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 1
|  Tasks   |Version|Filter|n-shot|Metric|   |Value |   |Stderr|
|----------|------:|------|-----:|------|---|-----:|---|-----:|
|winogrande|      1|none  |     5|acc   |↑  |0.5414|±  | 0.014|

github-actions · 2025-01-06T22:24:18Z

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Signed-off-by: Kyle Sayers <[email protected]>

…pressor

Signed-off-by: Kyle Sayers <[email protected]>

…pressor

Signed-off-by: Kyle Sayers <[email protected]>

kylesayrs · 2025-01-29T01:40:21Z

Something super weird is happening with that failing test

tests/llmcompressor/transformers/sparsification/test_compress_tensor_utils.py::test_correct_compressor_inferred[W8A16-int-False-pack-quantized-dense] - AssertionError: assert 'marlin-24' == 'pack-quantized'

I've observed two things

When running all the tests in the file, the test_correct_compressor_inferred consistently fails with these parameters, but only on this branch. This branch seemly does not touch any of the code related to this test
When selecting only this test with -k test_correct_compressor_inferred, the test successfully passes 90% of the time, but 10% of the time fails with the warning assert (Sparse24BitMaskConfig(registry_requires_subclass=False, format='s parse-24-bitmask', targets=['...

In the meantime, this PR is reviewable

…pressor

Signed-off-by: Kyle Sayers <[email protected]>

kylesayrs · 2025-01-31T19:54:48Z

Looks like adding this change fixed the test

weights = torch.rand(10, 4)
    if is_24:
        weights = _make_24_sparse(weights)
    else:
        weights[0, :] = torch.ones(4, )  # guarantee not 24 sparse

The most likely explanation is that this test randomly fails, and that this PR happened to be unlucky

kylesayrs changed the base branch from main to kylesayrs/gptq-partition January 6, 2025 22:24

Base automatically changed from kylesayrs/gptq-partition to main January 8, 2025 22:15

kylesayrs mentioned this pull request Jan 10, 2025

[MoE] GPTQ compress using callback not hook #1049

Merged

kylesayrs force-pushed the kylesayrs/remove-layer-compressor branch from d8c3261 to 08d700c Compare January 13, 2025 21:41

kylesayrs marked this pull request as ready for review January 13, 2025 23:10

kylesayrs marked this pull request as draft January 13, 2025 23:15

kylesayrs marked this pull request as ready for review January 14, 2025 04:31

kylesayrs requested review from dsikka, rahul-tuli and horheynm January 14, 2025 04:33

kylesayrs self-assigned this Jan 14, 2025

squash

59bdb66

Signed-off-by: Kyle Sayers <[email protected]>

kylesayrs force-pushed the kylesayrs/remove-layer-compressor branch from 71067ad to 59bdb66 Compare January 23, 2025 17:56

Merge remote-tracking branch 'origin' into kylesayrs/remove-layer-com…

669965e

…pressor

kylesayrs added the ready When a PR is ready for review label Jan 23, 2025

Merge branch 'main' into kylesayrs/remove-layer-compressor

e12d4da

kylesayrs marked this pull request as draft January 26, 2025 16:54

kylesayrs removed the ready When a PR is ready for review label Jan 26, 2025

kylesayrs added 3 commits January 27, 2025 15:24

fix tests

f4f3d26

Signed-off-by: Kyle Sayers <[email protected]>

Merge remote-tracking branch 'origin' into kylesayrs/remove-layer-com…

46cc9bc

…pressor

style

1eea2ab

Signed-off-by: Kyle Sayers <[email protected]>

kylesayrs added the ready When a PR is ready for review label Jan 27, 2025

kylesayrs removed the ready When a PR is ready for review label Jan 29, 2025

kylesayrs marked this pull request as ready for review January 29, 2025 04:06

kylesayrs added 3 commits January 28, 2025 23:06

Merge branch 'main' into kylesayrs/remove-layer-compressor

0f5c8ad

Merge remote-tracking branch 'origin' into kylesayrs/remove-layer-com…

077c68e

…pressor

ensure the random weight is not 24 sparse

ecee510

Signed-off-by: Kyle Sayers <[email protected]>

kylesayrs added the ready When a PR is ready for review label Jan 31, 2025

remove leftover comment

54fd6fb

Signed-off-by: Kyle Sayers <[email protected]>

kylesayrs mentioned this pull request Jan 31, 2025

Move SparseGPTModifier location with backwards compatibility #919

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace LayerCompressor with HooksMixin #1038

Replace LayerCompressor with HooksMixin #1038

kylesayrs commented Jan 6, 2025 •

edited

Loading

github-actions bot commented Jan 6, 2025

kylesayrs commented Jan 29, 2025 •

edited

Loading

kylesayrs commented Jan 31, 2025

Replace LayerCompressor with HooksMixin #1038

Are you sure you want to change the base?

Replace LayerCompressor with HooksMixin #1038

Conversation

kylesayrs commented Jan 6, 2025 • edited Loading

Purpose

Prerequisites

Changes

Interface/ Features

Implementations

Tests

Testing

Potential Follow ups

Regression Testing

Evaluation

github-actions bot commented Jan 6, 2025

kylesayrs commented Jan 29, 2025 • edited Loading

kylesayrs commented Jan 31, 2025

kylesayrs commented Jan 6, 2025 •

edited

Loading

kylesayrs commented Jan 29, 2025 •

edited

Loading