We propose One-D-Piece, a novel image tokenizer supporting quality-controllable image compression.
Dataset | Model | Link | FID |
---|---|---|---|
ImageNet | One-D-Piece-L-256 Tokenizer | checkpoint | 1.08 (reconstruction) |
ImageNet | One-D-Piece-B-256 Tokenizer | checkpoint | 1.11 (reconstruction) |
ImageNet | One-D-Piece-S-256 Tokenizer | checkpoint | 1.48 (reconstruction) |
ImageNet | One-D-Piece-L-256 Generator | checkpoint | 2.35 (generation) |
ImageNet | One-D-Piece-B-256 Generator | checkpoint | 2.70 (generation) |
ImageNet | One-D-Piece-S-256 Generator | checkpoint | 2.67 (generation) |
For just runnnig tokenizer/detokenizer inference, run the following script.
python3 scripts/inference_tokenizer.py --image assets/ILSVRC2012_val_00010240.png
For the inference of generator model, run the following script.
python3 scripts/inference_generator.py
For training on ImageNet Dataset, you have to prepare WebDataset files as follows. You may also need to use huggingface-cli login
for doing this.
python3 data/convert_imagenet_to_wds.py \
--output_dir ./imagenet_sharded
If you train the tokenizer, you also need pretrained tokenizer weight for stage1 training.
wget https://huggingface.co/fun-research/TiTok/resolve/main/maskgit-vqgan-imagenet-f16-256.bin
We provide a Slurm batch script for training One-D-Piece models.
sbatch train_tokenizer.sh configs/training/stage1/one-d-piece_s256.yaml
For the training of generator model, use the following script.
sbatch train_generator.sh configs/training/generator/maskgit_one-d-piece_s256.yaml
Evaluation runs for several steps.
First, you have to prepare reconstructed images as follows. This generates generated/one-d-piece-s-256_len-256/images.npy
which includes all the reconstructed images for ImageNet-1K validation split.
WANDB_MODE=offline accelerate launch \
--mixed_precision=bf16 \
--num_machines=1 \
--num_processes=1 \
--machine_rank=0 \
--main_process_ip=127.0.0.1 \
--main_process_port=9999 \
--same_network \
scripts/reconstruct_tokenizer.py \
--config configs/eval/one-d-piece_s256.yaml \
--length=128 \
--output_dir generated/one-d-piece-s-256_len-128
Second, you have to run evaluation script.
For token contribution analysis, we utilized the following procedure.
python3 scripts/generate_token_contribution_data.py \
--config configs/eval/one-d-piece_s256.yaml \
--output_dir analysis/one-d-piece-s-256
After that, you can use scripts/visualize_token_contribution.py
to generate visualized heatmaps and grids.
python3 scripts/visualize_token_contribution.py --input analysis/one-d-piece-s-256
For first token analysis, we utilized the following procedure.
First, you have to generate tokenized images with the following command.
WANDB_MODE=offline accelerate launch \
--mixed_precision=bf16 \
--num_machines=1 \
--num_processes=1 \
--machine_rank=0 \
--main_process_ip=127.0.0.1 \
--main_process_port=9999 \
--same_network \
scripts/reconstruct_tokenizer.py \
--config configs/eval/one-d-piece_s256.yaml \
--output_dir analysis/one-d-piece-s-256 \
--tokens
After that, you can use scripts/visualize_first_token_clustering.py
to generate first token clusters.
python3 scripts/visualize_first_token_clustering.py --data analysis/one-d-piece-s-256/tokens.npz --prefix 1208
This project is licensed under the Apache License 2.0.
It is based on the bytedance/1d-tokenizer developed by Bytedance Ltd., which is also licensed under the Apache License 2.0.
We have built upon their work by introducing additional features and modifications tailored to our specific use cases. We acknowledge and appreciate their contribution as the foundation of our development.
@misc{onedpiece,
title = {One-D-Piece: Image Tokenizer Meets Quality-Controllable Compression},
author = {Keita Miwa and Kento Sasaki and Hidehisa Arai and Tsubasa Takahashi and Yu Yamaguchi},
year = {2025},
eprint = {2501.10064},
archivePrefix= {arXiv},
primaryClass = {cs.CV},
url = {https://arxiv.org/abs/2501.10064},
}