This is code repository of Group 9 in HKUST Course 6000M Recent Advances in Deep Learning, which can be seen on GitHub Code repository.
We seperate these projects into three main modules, which are DCGAN, StyleGAN3 and Improved diffusion respectively.
cd ./dcGAN
conda create -n dc python=3.8.8
conda activate dc
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
pip install matplotlib
# change all images to RGB 3 channels
cd ..
cd ./dcGAN
wget --load-cookies /tmp/cookies.txt " \
confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate \
'' -O- | \
sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1JTC291uskdM0XgL6rpCXGEdq0_bgGDk9" -O \
&& rm -rf /tmp/cookies.txt
wget --load-cookies /tmp/cookies.txt " \
confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate \
'' -O- | \
sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1GJQ_88k7YlfXR7UWtjcAx6BlnNi1Y0m2" -O \
&& rm -rf /tmp/cookies.txt
# install 7za
1. wget
2. uncompress document
3. make
4. edit the line of "DECT_DIR" line in
5. ./
6. or just use sudo apt-get if you are in superdoers
# uncompress
7za x [filename] -r -o./
install GCC (style GAN required)
tar -xvf gcc-7.5.0.tar.gz && cd gcc-7.5.0
mkdir objdir && cd objdir
../configure --disable-checking --enable-languages=c,c++ --disable-multilib --prefix=/home/zpeng/install/gcc-7.5.0 --enable-threads=posix
make -j64 && make install (multitread compile)
export PATH=/home/zpeng/install/gcc-7.5.0/bin:/home/zpeng/install/gcc-7.5.0/lib64:$PATH
export LD_LIBRARY_PATH=/home/zpeng/install/gcc-7.5.0/lib/:$LD_LIBRARY_PATH
source ~/.bashrc
cd ./styleGAN3
conda env create -f environment.yml
conda activate stylegan3
cd ..
cd ./styleGAN3
# 128x128 resolution.
python --source=../../data/images --dest=../../data/
python --outdir=./runs --cfg=stylegan3-t --data=../../data/ \
--gpus=8 --batch=32 --gamma=8.2 --mirror=1 [--resume=*/*.pkl]
# Generate an image using pre-trained model
python --outdir=out --trunc=1 --seeds=6000 --network=*/*.pkl
# Render a 4x2 grid of interpolations for seeds 0 through 31.
python --output=lerp.mp4 --trunc=1 --seeds=0-31 --grid=4x2 --network=*/*.pkl
# Calculate dataset mean and std, needed in subsequent steps.
python stats --source=../../data/
# Calculate average spectrum for the training data.
python calc --source=../../data/ --dest=tmp/training-data.npz --mean=134.437 --std=75.2062
# Calculate average spectrum for a pre-trained generator.
python calc --source=*/*.pkl --dest=tmp/stylegan3-t.npz --mean=134.437 --std=75.2062 --num=70000
# Display results.
python heatmap tmp/training-data.npz
python heatmap tmp/stylegan3-t.npz
python slices tmp/training-data.npz tmp/stylegan3-t.npz
The input dataset format is guessed from the --source argument:
--source *_lmdb/ Load LSUN dataset
--source cifar-10-python.tar.gz Load CIFAR-10 dataset
--source train-images-idx3-ubyte.gz Load MNIST dataset
--source path/ Recursively load all images from path/
--source Recursively load all images from
Specifying the output format and path:
--dest /path/to/dir Save output files under /path/to/dir
--dest /path/to/ Save output files into /path/to/
The output dataset format can be either an image folder or an uncompressed zip archive. Zip archives makes it easier to move datasets around file servers and clusters, and may offer better training performance on network file systems.
Images within the dataset archive will be stored as uncompressed PNG. Uncompresed PNGs can be efficiently decoded in the training loop.
Class labels are stored in a file called 'dataset.json' that is stored at the dataset root folder. This file has the following structure:
"labels": [
... repeated for every image in the datase
If the 'dataset.json' file cannot be found, the dataset is interpreted as not containing class labels.
Image scale/crop and resolution requirements:
Output images must be square-shaped and they must all have the same power- of-two dimensions.
To scale arbitrary input image size to a specific width and height, use the --resolution option. Output resolution will be either the original input resolution (if resolution was not specified) or the one specified with --resolution option.
Use the --transform=center-crop or --transform=center-crop-wide options to apply a center crop transform on the input image. These options should be used with the --resolution option. For example:
python --source LSUN/raw/cat_lmdb --dest /tmp/lsun_cat
--transform=center-crop-wide --resolution=512x384
--source PATH Directory or archive name for input dataset [required]
--dest PATH Output directory or archive name for output dataset [required]
--max-images INTEGER Output only up to `max-images` images
--transform [center-crop|center-crop-wide], Input crop/resize mode
--resolution WxH Output resolution (e.g., '512x512')
--help Show this message and exit.
# Train StyleGAN3-T for AFHQv2 using 8 GPUs.
python --outdir=~/training-runs --cfg=stylegan3-t --data=~/datasets/ \
--gpus=8 --batch=32 --gamma=8.2 --mirror=1
# Fine-tune StyleGAN3-R for MetFaces-U using 1 GPU, starting from the pre-trained FFHQ-U pickle.
python --outdir=~/training-runs --cfg=stylegan3-r --data=~/datasets/ \
--gpus=8 --batch=32 --gamma=6.6 --mirror=1 --kimg=5000 --snap=5 \
# Train StyleGAN2 for FFHQ at 1024x1024 resolution using 8 GPUs.
python --outdir=~/training-runs --cfg=stylegan2 --data=~/datasets/ \
--gpus=8 --batch=32 --gamma=10 --mirror=1 --aug=noaug
--outdir DIR Where to save the results [required]
--cfg [stylegan3-t|stylegan3-r|stylegan2]
Base configuration [required]
--data [ZIP|DIR] Training data [required]
--gpus INT Number of GPUs to use [required]
--batch INT Total batch size [required]
--gamma FLOAT R1 regularization weight [required]
--cond BOOL Train conditional model [default: False]
--mirror BOOL Enable dataset x-flips [default: False]
--aug [noaug|ada|fixed] Augmentation mode [default: ada]
--resume [PATH|URL] Resume from given network pickle
--freezed INT Freeze first layers of D [default: 0]
--p FLOAT Probability for --aug=fixed [default: 0.2]
--target FLOAT Target value for --aug=ada [default: 0.6]
--batch-gpu INT Limit batch size per GPU
--cbase INT Capacity multiplier [default: 32768]
--cmax INT Max. feature maps [default: 512]
--glr FLOAT G learning rate [default: varies]
--dlr FLOAT D learning rate [default: 0.002]
--map-depth INT Mapping network depth [default: varies]
--mbstd-group INT Minibatch std group size [default: 4]
--desc STR String to include in result dir name
--metrics [NAME|A,B,C|none] Quality metrics [default: fid50k_full]
--kimg KIMG Total training duration [default: 25000]
--tick KIMG How often to print progress [default: 4]
--snap TICKS How often to save snapshots [default: 50]
--seed INT Random seed [default: 0]
--fp32 BOOL Disable mixed-precision [default: False]
--nobench BOOL Disable cuDNN benchmarking [default: False]
--workers INT DataLoader worker processes [default: 3]
-n, --dry-run Print training options and exit
--help Show this message and exit.
Setting up environment
cd ./improved-diffusion
pip install -e .
pip install mpi4py
MODEL_FLAGS="--image_size 64 --num_channels 128 --num_res_blocks 3" DIFFUSION_FLAGS="--diffusion_steps 4000 --noise_schedule linear"
TRAIN_FLAGS="--lr 1e-4 --microbatch 16 --class_cond False"
python scripts/ --data_dir path/to/images $MODEL_FLAGS $DIFFUSION_FLAGS $TRAIN_FLAGS
Three models will be created after the training, these three models should all be preserved if continue training is needed.
EMA model is recommended for sampling, which usually presentes a better results. Pretrained model for this project can be found in
MODEL_FLAGS="--image_size 64 --num_channels 128 --num_res_blocks 3"
DIFFUSION_FLAGS="--diffusion_steps 1000 --noise_schedule linear"
TRAIN_FLAGS="--lr 1e-4 --microbatch 16 --class_cond False"
python scripts/ --model_path path/to/checkpoint/ \
--batch_size 4 --num_samples 40 --timestep_respacing 250 $MODEL_FLAGS $DIFFUSION_FLAGS
After the sampling, a samples_num_samplesx64x64x3.npz file will be created,where arr_0
in the file is the collection of sample images.
You can find more detail information in ./improved-diffusion/