Adds code

mitscha · mitscha · commit 5ca2bd9f5ddf · 2018-10-27T16:23:23.000+02:00
diff --git a/README.md b/README.md
@@ -1,2 +1,72 @@
-# dplc
-Deep generative models for distribution-preserving lossy compression
+# Deep Generative Models for Distribution-Preserving Lossy Compression
+
+<p align='center'>
+  <img src='figs/visuals.jpeg' width='440'/>
+</p>
+
+### [[Paper]](https://arxiv.org/abs/1805.11057) [[Citation]](#citation)
+
+PyTorch implementation of **Deep Generative Models for Distribution-Preserving Lossy Compression** (NIPS 2018) a framework that unifies generative models and lossy compression. The resulting models behave like generative models at zero bitrate, almost perfectly reconstruct the training data at high enough bitrate, and smoothly interpolate between generation and reconstruction at intermediate bitrates (cf. the figure above, the numbers indicate the rate in bits per pixel).
+
+
+## Prerequisites
+
+- Python 3 (tested with Python 3.6.4)
+- PyTorch (version 0.4.1)
+- [tensorboardX](https://github.com/lanpa/tensorboardX)
+
+## Training
+
+The training procedure consists of two steps
+
+1. Learn a generative model of the data.
+2. Learn a rate-constrained encoder and a stochastic mapping into the latent space of the of the fixed generative model by minimizing distortion.
+
+The `train.py` script allows to do both of these steps.
+
+To learn the generative model we consider [[Wasserstein GAN with gradient penalty (WGAN-GP)](https://arxiv.org/abs/1704.00028), [Wasserstein Autoencoder (WAE)](https://arxiv.org/abs/1711.01558), and a combination of the two termed Wasserstein++. The following examples show how to train these models as in the experiments in the paper using the [CelebA](http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html) data set (see `train.py` for a description of the flags).
+
+WGAN-GP:
+
+    python train.py --dataset celeba --dataroot /path/to/traindata/ --testroot /path/to/testdata/ --cuda --nz 128 \
+        --sigmasqz 1.0 --lr_eg 0.0001 --lr_di 0.0001 --beta1 0.5 --beta2 0.9 --niter 165 --check_every 100 \
+        --workers 6 --outf /path/to/results/ --batchSize 64 --test_every 100 --addsamples 10000 --manualSeed 321 \
+        --wganloss
+
+WAE:
+
+    python train.py --dataset celeba --dataroot /path/to/traindata/ --testroot /path/to/testdata/ --cuda --nz 128 \
+        --sigmasqz 1.0 --lr_eg 0.001 --niter 55 --decay_steps 30 50 --decay_gamma 0.4 --check_every 100 \
+        --workers 8 --recloss --mmd --bnz --outf /path/to/results/ --lbd 100 --batchSize 256 --detenc --useenc \
+        --test_every 20 --addsamples 10000 --manualSeed 321
+
+Wasserstein++:
+
+    python train.py --dataset celeba --dataroot /path/to/traindata/ --testroot /path/to/testdata/ --cuda --nz 128 \
+        --sigmasqz 1.0 --lr_eg 0.0003 --niter 165 --decay_steps 100 140 --decay_gamma 0.4 --check_every 100 \
+        --workers 6 --recloss --mmd --bnz --outf /path/to/results/ --lbd 100 --batchSize 256 --detenc --useenc \
+        --test_every 20 --addsamples 10000 --manualSeed 321 --wganloss --useencdist --lbd_di 0.000025 --intencprior
+
+To learn the rate-constrained encoder and the stochastic mapping run the following (parameters again for the experiment on the CelebA data set):
+
+    python train.py --dataset celeba --dataroot /path/to/traindata/ --testroot /path/to/testdata/ --cuda --nz 128 \
+        --sigmasqz 1.0 --lr_eg 0.001 --niter 55 --decay_steps 30 50 --decay_gamma 0.4 --check_every 100 \
+        --workers 6 --recloss --mmd --bnz --batchSize 256 --useenc --comp --freezedec --test_every 100 \
+        --addsamples 10000 --manualSeed 321 --outf /path/to/results/ --netG /path/to/trained/generator \
+        --nresenc 2 --lbd 300 --ncenc 8
+
+Here, `--ncenc` determines the number of channels at the encoder output (and hence the bitrate) and `--lbd` determines the regularization strength of the MMD penalty on the latent space (has to be adapted as a function of the bitrate).
+
+
+In the paper we also consider the [LSUN bedrooms](https://github.com/fyu/lsun) data set. We provide the flag `--lsun_custom_split` that splits off 10k samples for the LSUM training set (the LSUN testing set is too small to compute the FID score to asses sample quality). Otherwise, training on the LSUN data set is as outlined above (with different parameters).
+
+
+## Citation
+
+If you use this code for your research, please cite this paper:
+
+    @inproceedings{tschannen2018deep,
+        Author = {Tschannen, Michael and Agustsson, Eirikur and Lucic, Mario},
+        Title = {Deep Generative Models for Distribution-Preserving Lossy Compression},
+        Booktitle = {Advances in Neural Information Processing Systems (NIPS)},
+        Year = {2018}}
diff --git a/figs/visuals.jpeg b/figs/visuals.jpeg
diff --git a/models.py b/models.py
@@ -0,0 +1,210 @@
+import torch
+from torch import nn
+from resblock import BasicBlock
+from torch.autograd import Variable
+from scalar_quantizer import quantize
+import math
+
+
+
+# Encoder and stochastic function (B in the paper)
+class _netE(nn.Module):
+    def __init__(self, nc, nz, ngf, kernel=2, padding=1, img_width=64, img_height=64,
+                    quant_levels=None, do_comp=False, ncenc=8, nresenc=0, detenc=False,
+                    noisedelta=0.5, bnz=False, ngpu=1):
+        super(_netE, self).__init__()
+        self.ngpu = ngpu
+        self.detenc = detenc or not do_comp
+        self.noisedelta = noisedelta
+        self.nfmodelz = math.ceil(nz / ((img_height//16) * (img_width//16))) + ncenc
+        self.ncenc = ncenc
+
+        model_down_list = [
+            # input is (nc) x 64 x 64
+            nn.Conv2d(nc, ngf, kernel, 2, padding, bias=False),
+            nn.ReLU(True),
+            # state size. (ndf) x 32 x 32
+            nn.Conv2d(ngf, ngf * 2, kernel, 2, padding, bias=False),
+            nn.BatchNorm2d(ngf * 2),
+            nn.ReLU(True),
+            # state size. (ndf*2) x 16 x 16
+            nn.Conv2d(ngf * 2, ngf * 4, kernel, 2, padding, bias=False),
+            nn.BatchNorm2d(ngf * 4),
+            nn.ReLU(True),
+            # state size. (ndf*4) x 8 x 8
+            nn.Conv2d(ngf * 4, ngf * 8, kernel, 2, padding, bias=False),
+            nn.BatchNorm2d(ngf * 8),
+            nn.ReLU(True)
+        ]
+        # state size. (ndf*8) x 4 x 4
+
+        # quantize if in compression mode
+        if do_comp:
+            model_down_list += [
+                nn.Conv2d(ngf * 8, ncenc, 3, 1, 1, bias=True),
+                quantize(quant_levels)
+            ]
+
+        self.model_down = nn.Sequential(*model_down_list)
+
+        # stochastic function mapping compressed representation to latent space
+        # of generator (B in paper)
+        if do_comp:
+            model_z_list = [
+                nn.ConvTranspose2d(ncenc, ngf * 8, 3, 1, 1, bias=True) if detenc \
+                    else nn.ConvTranspose2d(self.nfmodelz, ngf * 8, 3, 1, 1, bias=True)
+            ]
+        else:
+            model_z_list = []
+
+        if nresenc > 0:
+            model_z_list += [BasicBlock(ngf * 8, ngf * 8) for _ in range(nresenc)]
+
+        model_z_list += [nn.Conv2d(ngf * 8, nz, (img_height//16, img_width//16), 1, 0, bias=False)]
+
+        # batchnorm to facilitate prior matching
+        if bnz:
+            model_z_list += [nn.BatchNorm2d(nz)]
+
+        self.model_z = nn.Sequential(*model_z_list)
+
+
+    def forward(self, input):
+        use_cuda = isinstance(input.data, torch.cuda.FloatTensor)
+        if use_cuda and self.ngpu > 1:
+            out_down = nn.parallel.data_parallel(self.model_down, input, range(self.ngpu))
+        else:
+            out_down = self.model_down(input)
+
+        if not self.detenc:
+            # feed noise of appropriate dimension when using stoc. function
+            out_down_pad_size = list(out_down.size())
+            out_down_pad_size[1] = self.nfmodelz - self.ncenc
+            out_down_pad = torch.zeros(out_down_pad_size)
+            out_down_pad.uniform_(-self.noisedelta, self.noisedelta)
+            if use_cuda:
+                out_down_pad = out_down_pad.cuda()
+            out_down = torch.cat([out_down, Variable(out_down_pad)], 1)
+
+        if use_cuda and self.ngpu > 1:
+            output = nn.parallel.data_parallel(self.model_z, out_down, range(self.ngpu))
+        else:
+            output = self.model_z(out_down)
+
+        return output
+
+
+# Standard DCGAN-type generator/decoder
+class _netG(nn.Module):
+    def __init__(self, nc, nz, ngf, kernel=2, padding=1, output_padding=0, img_width=64, img_height=64, nresdec=0, ngpu=1):
+        super(_netG, self).__init__()
+        self.ngpu = ngpu
+
+        # input is z, going into a convolution
+        main_list = [nn.ConvTranspose2d(nz, ngf * 8, (img_height//16, img_width//16), 1, 0, bias=False),
+                nn.BatchNorm2d(ngf * 8),
+                nn.ReLU(True)]
+
+        if nresdec > 0:
+            main_list += [BasicBlock(ngf * 8, ngf * 8) for _ in range(nresdec)]
+
+        main_list += [
+            # state size. (ngf*8) x 4 x 4
+            nn.ConvTranspose2d(ngf * 8, ngf * 4, kernel, 2, padding, output_padding, bias=False),
+            nn.BatchNorm2d(ngf * 4),
+            nn.ReLU(True),
+            # state size. (ngf*4) x 8 x 8
+            nn.ConvTranspose2d(ngf * 4, ngf * 2, kernel, 2, padding, output_padding, bias=False),
+            nn.BatchNorm2d(ngf * 2),
+            nn.ReLU(True),
+            # state size. (ngf*2) x 16 x 16
+            nn.ConvTranspose2d(ngf * 2,     ngf, kernel, 2, padding, output_padding, bias=False),
+            nn.BatchNorm2d(ngf),
+            nn.ReLU(True),
+            # state size. (ngf) x 32 x 32
+            nn.ConvTranspose2d(    ngf,      ngf, kernel, 2, padding,  output_padding, bias=False),
+            nn.BatchNorm2d(ngf),
+            nn.ReLU(True),
+            nn.Conv2d(    ngf,      nc, 3, 1, 1, bias=True),
+            nn.Tanh()
+            # state size. (nc) x 64 x 64
+        ]
+
+        self.main = nn.Sequential(*main_list)
+
+    def forward(self, input):
+        if isinstance(input.data, torch.cuda.FloatTensor) and self.ngpu > 1:
+            output = nn.parallel.data_parallel(self.main, input, range(self.ngpu))
+        else:
+            output = self.main(input)
+
+        return output
+
+
+# MLP discriminator in z-space
+class _netDz(nn.Module):
+    def __init__(self, nz, ndf=512, ndl=5, ngpu=0, avbtrick=False, sigmasq=1):
+        super(_netDz, self).__init__()
+        self.ngpu = ngpu
+        self.avbtrick = avbtrick
+        self.sigmasqz = sigmasq
+        self.nz = nz
+
+        layers = [[nn.Linear(ndf, ndf), nn.ReLU(True)] for _ in range(ndl-2)]
+
+        layers = [nn.Linear(nz, ndf), nn.ReLU(True)] \
+                    + sum(layers, []) \
+                    + [nn.Linear(ndf, 1)]
+
+        self.main = nn.Sequential(*layers)
+
+    def forward(self, input):
+        if isinstance(input.data, torch.cuda.FloatTensor) and self.ngpu > 1:
+            output = nn.parallel.data_parallel(self.main, input, range(self.ngpu))
+        else:
+            output = self.main(input)
+
+        # Nowozin trick from WAE paper, only valid for Gaussian prior
+        if self.avbtrick:
+            output = output - torch.norm(input, p=2, dim=1, keepdim=True)**2 / 2 / self.sigmasqz \
+                        - 0.5 * math.log(2 * math.pi) \
+                        - 0.5 * self.nz * math.log(self.sigmasqz)
+
+        return output.view(-1, 1).squeeze(1)
+
+
+# DCGAN-style discriminator in image space
+class _netDim(nn.Module):
+    def __init__(self, nc=3, ndf=64, kernel=2, padding=1, img_width=64, img_height=64, ngpu=1):
+        super(_netDim, self).__init__()
+        self.ngpu = ngpu
+        self.main = nn.Sequential(
+            # input is (nc) x 64 x 64
+            nn.Conv2d(    nc,      ndf, 3, 1, 1, bias=True),
+            nn.LeakyReLU(0.2, inplace=True),
+            nn.Conv2d(ndf, ndf, kernel, 2, padding, bias=False),
+            nn.LayerNorm([ndf, img_height//2, img_width//2]),
+            nn.LeakyReLU(0.2, inplace=True),
+            # state size. (ndf) x 32 x 32
+            nn.Conv2d(ndf, ndf * 2, kernel, 2, padding, bias=False),
+            nn.LayerNorm([ndf * 2, img_height//4, img_width//4]),
+            nn.LeakyReLU(0.2, inplace=True),
+            # state size. (ndf*2) x 16 x 16
+            nn.Conv2d(ndf * 2, ndf * 4, kernel, 2, padding, bias=False),
+            nn.LayerNorm([ndf * 4, img_height//8, img_width//8]),
+            nn.LeakyReLU(0.2, inplace=True),
+            # state size. (ndf*4) x 8 x 8
+            nn.Conv2d(ndf * 4, ndf * 8, kernel, 2, padding, bias=False),
+            nn.LayerNorm([ndf * 8, img_height//16, img_width//16]),
+            nn.LeakyReLU(0.2, inplace=True),
+            # state size. (ndf*8) x 4 x 4
+            nn.Conv2d(ndf * 8, 1, (img_height//16, img_width//16), 1, 0, bias=False),
+        )
+
+    def forward(self, input):
+        if isinstance(input.data, torch.cuda.FloatTensor) and self.ngpu > 1:
+            output = nn.parallel.data_parallel(self.main, input, range(self.ngpu))
+        else:
+            output = self.main(input)
+
+        return output.view(-1, 1)
diff --git a/resblock.py b/resblock.py
@@ -0,0 +1,38 @@
+import torch.nn as nn
+
+def conv3x3(in_planes, out_planes, stride=1):
+    """3x3 convolution with padding"""
+    return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,
+                     padding=1, bias=False)
+
+
+class BasicBlock(nn.Module):
+    expansion = 1
+
+    def __init__(self, inplanes, planes, stride=1, downsample=None):
+        super(BasicBlock, self).__init__()
+        self.conv1 = conv3x3(inplanes, planes, stride)
+        self.bn1 = nn.BatchNorm2d(planes)
+        self.relu = nn.ReLU(inplace=True)
+        self.conv2 = conv3x3(planes, planes)
+        self.bn2 = nn.BatchNorm2d(planes)
+        self.downsample = downsample
+        self.stride = stride
+
+    def forward(self, x):
+        residual = x
+
+        out = self.conv1(x)
+        out = self.bn1(out)
+        out = self.relu(out)
+
+        out = self.conv2(out)
+        out = self.bn2(out)
+
+        if self.downsample is not None:
+            residual = self.downsample(x)
+
+        out += residual
+        out = self.relu(out)
+
+        return out
diff --git a/scalar_quantizer.py b/scalar_quantizer.py
@@ -0,0 +1,31 @@
+import torch
+import torch.nn as nn
+from torch.autograd import Variable
+
+class quantize(nn.Module):
+    def __init__(self, levels=[-1.0, 1.0], sigma=1.0):
+        super(quantize, self).__init__()
+        self.levels = levels
+        self.sigma = sigma
+
+    def forward(self, input):
+        levels = input.data.new(self.levels)
+        xsize = list(input.size())
+
+        # Compute differentiable soft quantized version
+        input = input.view(*(xsize + [1]))
+        level_var = Variable(levels, requires_grad=False)
+        dist = torch.pow(input-level_var, 2)
+        output = torch.sum(level_var * nn.functional.softmax(-self.sigma*dist, dim=-1), dim=-1)
+
+        # Compute hard quantization (invisible to autograd)
+        _, symbols = torch.min(dist.data, dim=-1, keepdim=True)
+        for _ in range(len(xsize)): levels.unsqueeze_(0)
+        levels = levels.expand(*(xsize + [len(self.levels)]))
+
+        quant = levels.gather(-1, symbols.long()).squeeze_(dim=-1)
+
+        # Replace activations in soft variable with hard quantized version
+        output.data = quant
+
+        return output
diff --git a/train.py b/train.py