KoroKoro is an automated pipeline for converting 2D videos into detailed 3D models using advanced techniques.
KoroKoro uses a mix of advanced deep learning techniques to convert 30-second videos around an object to a fully interactive 3D object.
View live demo here
Given an input video, 40 frames are extracted [default, can be changed in extract_frames], these 40 frames are processed using the process_data
method in DataProcessing class to generate a NeRF-compatible dataset that includes a transforms.json
file.
Given a frame, if the object of interest
is among the 80 COCO classes, YOLOv8 predicts the bounding box coordinates of the object otherwise GroundingDINO handles the bounding box prediction taking a natural language prompt — the description/title of the object. This title is set in config/config.yaml.
Given a frame and the xy
coordinates of the bounding box around the object, SegmentAnythingv2 creates an accurate mask of the object, the mask is then used to extract only the object leaving the other areas/background empty.
See algorithm below
if object in coco_classes:
detect_with_yolov8()
if successful():
segment_with_sam2()
else:
detect_with_groundingdino()
if successful():
segment_with_sam2()
else:
detect_with_groundingdino()
if successful:
segment_with_sam()
Processed inputs from the previous steps are fed to Nerfstudio's implementation of Gaussian Splat — splatfacto.
The resulting splats are finally exported to a .ply
file
- Conda / Miniconda
NB: Tested on GPU compute with A10 (24GB) & Google Colab T4 (16GB)
sudo apt-get install libgl1-mesa-glx libegl1-mesa libxrandr2 libxrandr2 libxss1 libxcursor1 libxcomposite1 libasound2 libxi6 libxtst6
curl -sL \
"https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh" > \
"Miniconda3.sh"
bash Miniconda3.sh
# ⚠️ Might be different on your computer, please take note of mininconda installation directory
source /root/miniconda3/bin/activate
git clone https://github.com/Daheer/KoroKoro.git
cd KoroKoro
# This will setup the environment
bash setup.sh
# Activate the environment
conda activate korokoro
Configure the category
, title
& video_output
fields in config.yaml
category
: MS COCO class name if available e.g.book
, otherwise set toothers
title
: natural language of the object e.g.blue backpack
video_output
: path to the input video
python KoroKoro/pipeline/local.py
Simply run the command below, it will fetch products from the queue in Supabase and generate 3D models
python KoroKoro/pipeline/stage_01.py
python KoroKoro/pipeline/stage_02.py
Install xterm
!pip install colab-xterm
Load xterm extension
%load_ext colabxterm
Launch xterm terminal
%xterm
Continue from start of Installation instructions
📦KoroKoro
├─ .gitignore
├─ Dockerfile
├─ KoroKoro
│ ├─ __init__.py
│ ├─ components
│ │ ├─ __init__.py
│ │ ├─ data_ingestion.py
│ │ ├─ data_processing.py
│ │ ├─ data_transformation.py
│ │ ├─ initialization.py
│ │ ├─ model_trainer.py
│ │ └─ post_processing.py
│ ├─ config
│ │ ├─ __init__.py
│ │ └─ configuration.py
│ ├─ entity
│ │ └── __init__.py
│ ├─ logger.py
│ ├─ pipeline
│ │ ├── __init__.py
│ │ ├── local.py
│ │ ├── stage_01.py
│ │ └── stage_02.py
│ └─ utils
│ ├─ __init__.py
│ └─ constants.py
├─ GroundingDINO
│ ├─ groundingdino
│ │ ├─ __init__.py
│ │ ├─ config
│ │ ├─ datasets
│ │ ├─ models
│ │ └─ util
│ ├─ LICENSE
├─ config
│ └── config.yaml
├─ docker-compose.yml
├─ README.md
├─ requirements.txt
├─ setup.py
└─ setup.sh
Input | KoroKoro Version 1 | KoroKoro Version 2 |
---|---|---|
Setup Time | 45 minutes | 15 minutes |
Processing Time | 25 minutes | 5 minutes |
Training Time | 5 minutes | 5 minutes |
There are areas where this project can be improved including
-
Incorporate Trellis
-
Lighterweight .obj files -> right now, the resulting obj models are heavy (> 100MB) and I have to use sharding to save them in Supabase's storage bucket which limits file uploads to 50MB
-
Use Segment Anything to improve segmentation
Please reach out to me @ [email protected], I'd be happy to walk you through the project, including the Supabase database configuration