- pros
- Minimalist project structure. Only a few files included
- Continuously updating new features in the future, fully anticipating
- Follow industry trends and reproduce hot papers
May 21st, 2024
- Release SDXL finetune scripts & sample data
- download datasets from Baidu
- You can find running command in 'command' file
- Some key points when you finetune your model
- you need prepare Very High Quality data according your project needs
- 2k-4k pictures will be enough
- train your model through many epoch. I get the best result on 100 epoch
- high resolution is very important. I use 960*1280 here
- the quality of result from finetune model is far better than the ordinary model
- result sample [960*1280]
April 22th, 2024
- Release animation step1 train scripts
- download datasets from Baidu
- place downloaded files to your directory
- change your directory in config file
- run following cmd
- I only release partial data, so you need to prepare your data as i do
-
python -m accelerate.commands.launch --num_processes=2 train_script.py -c configs/train/animate_step1.yaml
April 8th, 2024
- Release animation
- download pretrained models and datasets from Baidu
- place downloaded files to your directory
- change your directory in config file
- run following cmd
- I use UBC fashion dataset to train step2. I release step2 training & testing scripts
- Step1 training script will be released later
-
python test_script.py -c configs/test/animate_step2.yaml
-
python -m accelerate.commands.launch --main_process_port=28500 --num_processes=2 train_script.py -c configs/train/animate_step2.yaml
April 2nd, 2024
March 26th, 2024
- Release TryOn
-
python test_script.py -c configs/test/viton.yaml
-
python train_script.py -c configs/train/viton.yaml
- more results
- including inference code and train code
- provide pretrained model files
- provide dataset
- core ideas largely come from Animate Anyone and I made some modifications
- reporduce model structure using huggingface diffusers
- remove pose guider and cross attention from Unet because I find them no use
- a different cross attention structure, with which you can input any size of image of condition
- i do not reproduce temporal attention
- i use HRViton dataset to train the virtual tryon model
-