Skip to content

Innomer/3D-Model-Generation-from-Prompt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

3D-Model-Generation-from-Prompt

Description: Task Assignment of Mann Bhanushali for AI Research Intern - 3D Model Generation for Pi Reality

Approaches:

I started with learning how to convert an image into a 3D Model first as there are many VLM models that can generate good Images from Prompts. I wanted to prevent using VLMs and hence went to using the technology they rely upon which is Diffusion Models. Post this, my approach was to try using fully pre-built models that directly do the task for me and then disintegrate the task into steps so as to learn the different foundational methodologies for increased control over the outcome. This lead to me using NeRF and then later Depth Image based approaches.

1. Shap-E:

Shap-E is an open-source text-to-3D model that generates 3D assets based on text descriptions. It utilises a Latent Diffusion model and uses implicit Neural representation to generate a 3D model from the pictures generated.

In this, I tried to directly use the Shap-E Github Repository to generate the 3D models. A Streamlit interface is provided in shapee_streamlit.py for testing and a Jupyter Notebook variant is also provided.

Steps:

  1. Clone Github Repository of Shap-E using the command git clone https://github.com/openai/shap-e.git
  2. Go into the directory using cd shap-e
  3. Run pip install -e .
  4. Go to the src/ folder and run shapee_streamlit.py or shapee_trial.ipynb to generate the model. In the Jupyter Notebook, you can change the prompt within the code itself.

Status: Works Considerably well!

2. Stable Diffusion Combined with Instant NGP (NeRF)

Text-to-image models like Stable Diffusion have achieved great success. Building on this, I utilized a 2D diffusion model (Stable Diffusion) to create 3D representations by optimizing a NeRF (Neural Radiance Field) from the images generated.

I tried using Stable Diffusion to create 2D images from 4 different angles (front,left,right and back) of an object given through a prompt. Based on the images generated, I used InstantNGP to create a NeRF model for visualization.

Problems:

  1. Depending on the version of Stable-Diffusion being used, the time to generate the model varies highly. Using the v1.4, it took lesser time but result was poor while using the v2.1 required greater compute power but worked a lot better. All the models were run on a Laptop RTX 3070 GPU.
  2. This methodolody did not yield good results due to the Stable Diffusion not being able to create accurate images from prompts. Problem was faced in generating multi view images of the same object despite passing the original object as a reference.

Steps:

  1. Run the stablediffcombo.ipynb file.
  2. Once the camera matrices and the images are generated, create the NeRF model.
  3. To do so, run git clone https://github.com/NVlabs/instant-ngp.git
  4. cd instant-ngp
  5. ./instant-ngp <path_to_images_folder>

Status: Does not work well on my GPU. Possibility of improvement given better compute abilities.

3. Depth-to-3D Modelling:

This method uses the depth maps generated by a Neural Network given an image, and crafts a singular view 3D model using the depth data.

In this, I used MiDaS Model to create a Depth Map from a RGB image. This Depth Map was used by Trimesh to generate the 3D model of the image which was displayed using PyVista Library.

Problems:

  1. This method is very crude and has no use until all views of the image can be stitched together so that an entire 3D model of the object in the image can be create.
  2. Since I used a Neural Network for depth estimation, it loses the internal features which get pooled in the depth map.
  3. A combination of depth map and the RGB image needs to be used to get a good and textured 3D model.

Steps:

  1. Run depth_to_3d.ipynb.

Progress:

Shap-E Works well to get the task done but other fundamental methods need fine-tuning and increased compute resources to perform better. The current device specification used for this task is Intel i7 11th Generation Processor, RTX 3070 Laptop GPU, and 16GB RAM.

Tools Used:

  1. Python 3.12
  2. CMake
  3. Conda
  4. Stable Diffusion
  5. NeRF (Instant-NGP)
  6. MiDaS (Neural Networks)
  7. PyTorch and Cuda 12.1
  8. Streamlit
  9. PyVista and TriMesh
  10. Jupyter Notebooks

Kindly install the requirements from the requirements.txt file using pip install -r requirements.txt.

In case of any compatibility errors, add a --no-cache-dir flag to tha above command.

Note for Video:

Unfortunately, I wasn't able to record a video while performing the task as my device wasn't capable of doing both at the same time. However, I have highlighted the steps for each of the techniques which I have used.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published