This guide provides detailed steps to configure and set up the Multiflow project.
Before proceeding, ensure you have Conda installed on your system. If not, follow these steps to install it:
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh # Make sure to add it as a PATH variable
source ~/.bashrc
conda --version # To check if it's installed
conda activate modelsmith
pip install -r machine_learning_core/multiflow/deps/requirements.txt
Run the provided script to download the required annotation files:
./machine_learning_core/multiflow/download_annots.sh
Download the COCO images from the official website. You need the train2014
, val2014
, and test2015
images. Follow these step-by-step instructions:
-
Download and extract the
train2014
dataset:# Download the dataset wget http://images.cocodataset.org/zips/train2014.zip -O coco_train2014.zip # If the download is interrupted, you can resume it with: curl -C - -o coco_train2014.zip http://images.cocodataset.org/zips/train2014.zip # Extract the dataset python3 -m zipfile -e coco_train2014.zip machine_learning_core/multiflow/images/coco # Remove the zip file to save space rm coco_train2014.zip
-
Download and extract the
val2014
dataset:# Download the dataset wget http://images.cocodataset.org/zips/val2014.zip -O coco_val2014.zip # If download is interrupted, use: curl -C - -o coco_val2014.zip http://images.cocodataset.org/zips/val2014.zip # Extract the dataset python3 -m zipfile -e coco_val2014.zip machine_learning_core/multiflow/images/coco # Remove the zip file to save space rm coco_val2014.zip
-
Download and extract the
test2015
dataset:# Download the dataset wget http://images.cocodataset.org/zips/test2015.zip -O coco_test2015.zip # If download is interrupted, use: curl -C - -o coco_test2015.zip http://images.cocodataset.org/zips/test2015.zip # Extract the dataset python3 -m zipfile -e coco_test2015.zip machine_learning_core/multiflow/images/coco # Remove the zip file to save space rm coco_test2015.zip
-
Create the directory:
mkdir -p machine_learning_core/multiflow/images/vg
-
Download and extract the VG images:
# Download part 1 wget https://cs.stanford.edu/people/rak248/VG_100K_2/images.zip -O vg_images_part1.zip # If download is interrupted, use: curl -C - -o vg_images_part1.zip https://cs.stanford.edu/people/rak248/VG_100K_2/images.zip # Extract part 1 directly to the target directory python3 machine_learning_core/multiflow/extract.py vg_images_part1.zip machine_learning_core/multiflow/images/vg rm vg_images_part1.zip # Download part 2 wget https://cs.stanford.edu/people/rak248/VG_100K_2/images2.zip -O vg_images_part2.zip # If download is interrupted, use: curl -C - -o vg_images_part2.zip https://cs.stanford.edu/people/rak248/VG_100K_2/images2.zip # Extract part 2 directly to the target directory python3 machine_learning_core/multiflow/extract.py vg_images_part2.zip machine_learning_core/multiflow/images/vg rm vg_images_part2.zip
-
Create the directory:
mkdir -p machine_learning_core/multiflow/images/cc3m
-
Download and extract the archive:
wget https://huggingface.co/datasets/liuhaotian/LLaVA-CC3M-Pretrain-595K/resolve/main/images.zip -O cc3m_images.zip # If download is interrupted, use: curl -C - -o cc3m_images.zip https://huggingface.co/datasets/liuhaotian/LLaVA-CC3M-Pretrain-595K/resolve/main/images.zip python3 -m zipfile -e cc3m_images.zip machine_learning_core/multiflow/images/cc3m rm cc3m_images.zip
-
Download the CC3M annotation data inside
machine_learning_core/multiflow/data/pretrain
:wget https://huggingface.co/datasets/liuhaotian/LLaVA-CC3M-Pretrain-595K/resolve/main/metadata.json -O machine_learning_core/multiflow/data/pretrain/metadata.json
-
Transform the metadata:
python machine_learning_core/multiflow/transform_metadata.py machine_learning_core/multiflow/data/pretrain/metadata.json machine_learning_core/multiflow/data/pretrain/cc3m_pretrain.json cc3m
By following these steps, you will successfully configure and set up the Multiflow project.
You can check that multiflow it's running by running the command:
conda activate modelsmith
cd machine_learning_core/multiflow
python3 prune.py --model xvlm --pruner multiflow