Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The amount of parameters #14

Open
shamangary opened this issue Apr 26, 2017 · 6 comments
Open

The amount of parameters #14

shamangary opened this issue Apr 26, 2017 · 6 comments

Comments

@shamangary
Copy link

shamangary commented Apr 26, 2017

I use the following setting, as suggested in the github.
L=40,k=12, no bottleneck
However, the parameter number is not 1M, it's 0.6M.
This problem also happen when I turn bottelneck on. I got different parameter number than the reported one.
Please tell me where do I miss. Thank you.

Calling the model:

dn_opt = {}
dn_opt.depth = 40
dn_opt.dataset = 'cifar10'
model = paths.dofile('densenet.lua')(dn_opt)
model:cuda()
print(model:getParameters():size())

In densenet.lua

local growthRate = 12

    --dropout rate, set it to 0 to disable dropout, non-zero number to enable dropout and set drop rate
    local dropRate = 0

    --#channels before entering the first denseblock
    local nChannels = 2 * growthRate

    --compression rate at transition layers
    local reduction = 0.5

    --whether to use bottleneck structures
    local bottleneck = false

Output of the parameter size

599050
[torch.LongStorage of size 1]
@liuzhuang13
Copy link
Owner

Hi! "BC" stands for bottleneck(B) and compression(C). This is explained at the "compression" paragraph at section 3 of the paper. To use a original DenseNet, you need to also set the variable "reduction" to 1 in the code.

@shamangary
Copy link
Author

Thank you very much. It matched now.

@shamangary
Copy link
Author

shamangary commented Apr 26, 2017

On the otherhand, the amount of parameters of DenseNet is small indeed, but the GPU memory will still be consumed by the complex structure instead of the parameters.

By using the 8GB GPU, I was able to run 11M parameters WRN.
However, I cannot run 0.8M parameters DenseNet-BC(L=100,k=12) since out-of-memory problem.
This might be caused by a lot of feature maps are stored during training.

@liuzhuang13
Copy link
Owner

Thanks for pointing out. I've just found other people discussing this, and wrote a comment on reddit here https://www.reddit.com/r/MachineLearning/comments/67fds7/d_how_does_densenet_compare_to_resnet_and/?utm_content=title&utm_medium=hot&utm_source=reddit&utm_name=MachineLearning

My suggestion is that trying a shallow and wide densenet, by setting depth smaller and growthRate larger.

@Tongcheng
Copy link

Hello @shamangary , regarding the memory cost of feature maps, currently we have a Caffe implementation which trys to address the memory hungry problem (listed under much more spatial efficient caffe implementation), the DenseNet-BC (L=100,k=12) should take no more than 2.5 GB when running with test on, about 1.7 GB when running without test mode. (Caffe seems to allocate separate spaces for testing.) Hope that would help!

@shamangary
Copy link
Author

OK. Thanks! Despite I wish Torch can also have such property. (QAQ)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants