Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow Performance with "Exhaustive Search" Permutation Strategy for Channel Pruning in CNN #1826

Open
Ulorewien opened this issue Aug 14, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@Ulorewien
Copy link

Ulorewien commented Aug 14, 2024

Describe the Bug

I am using the Apex library to prune a CNN model on the CIFAR-100 dataset. My model has 5 convolutional layers with 128, 256, and 512 channels that can be pruned. According to the library's documentation, when the number of channels is less than or equal to 2048, the "exhaustive search" strategy is employed to maximize the accuracy of the structured sparse network.

However, when I ran the pruning process on a P100 GPU, it took over 8 hours and still wasn't complete. This seems unusually slow, especially considering that the total number of channels in my model is significantly below the 2048 threshold. Given the lengthy execution time, I suspect there might be an issue with the "exhaustive search" strategy or that the channel threshold of 2048 might be too high for practical use.

Suggestion

I recommend revisiting the threshold for using the "exhaustive search" strategy. A lower threshold, possibly around 16 or 32 channels, might be more appropriate and could prevent such long execution times. Alternatively, optimizing the "exhaustive search" strategy for models with a channel count close to 2048 could also improve performance.

Minimal Steps/Code to Reproduce the Bug

  1. Use the CNN model defined below with the given hyperparameters.

Model:
ConvNet(
(layers): Sequential(
(0): Conv2d(3, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
(3): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1))
(4): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU()
(6): MaxPool2d(kernel_size=(3, 3), stride=2, padding=1, dilation=1, ceil_mode=False)
(7): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(8): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(9): ReLU()
(10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1))
(11): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(12): ReLU()
(13): MaxPool2d(kernel_size=(3, 3), stride=2, padding=0, dilation=1, ceil_mode=False)
(14): Dropout(p=0.25, inplace=False)
(15): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(16): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(17): ReLU()
(18): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1))
(19): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(20): ReLU()
(21): MaxPool2d(kernel_size=(3, 3), stride=1, padding=0, dilation=1, ceil_mode=False)
(22): Dropout(p=0.25, inplace=False)
(23): Flatten(start_dim=1, end_dim=-1)
(24): Linear(in_features=2048, out_features=1024, bias=True)
(25): ReLU()
(26): Dropout(p=0.5, inplace=False)
(27): Linear(in_features=1024, out_features=100, bias=True)
)
)

Hyperparameters:
n_epochs = 30
batch_size = 64
learning_rate = 1e-3

  1. Use the CrossEntropy loss function and the AdamW optimizer.
  2. Apply pruning using NVIDIA's Apex library on the model for CIFAR-100 dataset.
  3. Run the pruning process on a P100 GPU.

Expected Behavior

The pruning process should complete in a reasonable amount of time, especially with the "exhaustive search" strategy, given the total number of channels is significantly below the 2048 threshold.

Actual Behavior
The pruning process takes an excessive amount of time (over 8 hours) and does not complete.

Environment

Python: 3.10.12
PyTorch: 2.3.1+cu121
GPU: NVIDIA P100
Dataset: CIFAR-100

Thank you for your attention to this issue. I look forward to any insights or suggestions you might have.

@Ulorewien Ulorewien added the bug Something isn't working label Aug 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant