End-To-End TensoRF connection #108

SimonDaKappa · 2024-03-26T22:04:11Z

This PR covers the implementation of TensoRF as the nerf-worker container in docker. The nerf-worker is implemented similarly to colmaps sfm-worker, consuming nerf-in to put files in data/inputs, and producing nerf-out to send files from data/outputs.

Features:
TensoRF is now the nerf-worker container in docker
Communication between all workers and web-server via rabbitMQ
web-server now publishes finished sfm jobs to nerf worker
nerf-worker now consumes sfm jobs, runs the TensoRF pipeline, and produces the generated model/video
web-server now consumes finished nerf jobs, saves to MongoDB, and responds to web-app get requests for nerf video

…d application performance Contains: A LOT of debug code, will be removed in next few commits Not Tracked: web-app frontend changes Features: web-server/queue_service.py: digest_finished_XXXX functions now have access to RabbitMQ web-server/queue_service.py: digest_finished_sfms now creates a job and publishes to nerf-in TensoRF: Now pulled into vidtonerf. This allows for the creation of a docker container for a nerf-worker TensoRF/dockerFile: simple pythonslim instance with basic CV2 dependencies docker-compose.yaml: Created nerf-worker for tensoRF and future backends (gaussian), depends on web-server, rabbitmq, and uses port 5200 TensoRF/main.py: Created basic flask app with single get endpoint for rendered video. Consumes nerf-in and publishes to nerf-out. Runs tensorf based off config file defined in DockerFile CMD[...] web-server/controller.py: Finished end-point to serve rendered nerf video to frontend web-server/scene_service.py: Unified function return types with expectation in controller.py

…ug code and general cleanup Rabbitmq .env credentials and connection retry mechanism Modified: TensoRF/Dockerfile: no longer need ~1GB ffmpeg dependencies by using opencv-headless instead of opencv TensoRF/main.py: Unified input and output file structure to data/sfm_data and data/nerf_data, simplified process_nerf_job() and added retry mechanism for connection to rabbitmq TensoRF/requiresments.txt: replaced opencv with headless version, added shutils for file manip, and python-dotenv for environment variable rabbitmq comnection colmap/main.py: added rabbitmq retry mech web-server/controller.py: debug code removal web-server/services/queue_service.py: debug code removal

End To End connection

… gpu to test

nerf worker gpu docker image and cuda support

…eRF-or-Nothing/vidtonerf

rougejaw

At the moment, any significant training of the model results in the process being halted, as Pika sends a heartbeat every 300 seconds that is being blocked by the training process, killing the process.

… threadsafe ack/publish callbacks

Change blockConnection to spawn a thread to run nerf pipeline and use…

SimonDaKappa · 2024-04-15T05:16:06Z

Pika and GPU fixes for nerf-worker are now in. This is handled in the following way

OLD:
TensoRF/main.py fork main into
| Flask Process
| nerf_worker Pika / TensoRF Training process

NEW:
TensoRF/main.py fork main into
| FLask Process

Use pytorch.multiprocess
TensoRF/main.py from main spawn into
| nerf_worker pika process. On nerf-in consume:
| create thread to run training and rendered, use threadsafe callback to ack and publish vid to
| nerf-out

Additionally, Aidan has integrated logging into sfm-worker, web-server, and nerf-worker, with some minor tweaks I made to work with my TensoRF changes.

Something to note, colmap seems to spit out images not in the order which we would get them by splitting the video into frames, so the TensoRF training path is very unsmooth. Each frame to the next can rotate +/- 180 degrees around the scene center, leading to the final reconstructed video following the same nonsmooth path.

Reasoning:

Nerf flask server can only send get requests, not receive get requests if process spawned. Could
not figure out why.
Since the research code is baaaad, it loads tensors into cuda device memory by redeclaring the cuda
device every single time. CUDA has a known bug where forked processes, ( I assume something to do
with trying to copy-on-write cuda objects not working ) cannot declare cuda device more than once.
Pytorch has a wrapper around python.multiprocessing.spawn to spawn a fresh process and handle
transfer of tensors in that process. This works with the research code and heartbeat fix for pika.

…bilties. Add GPU supported up to CUDA ISA SM_86 (RTX 3000 Series). Merge Aidans logging

SimonDaKappa · 2024-04-15T05:29:07Z

Also added a .sh script empty_data.sh to clear out all the worker/webserver input and output data folders while preserving fild structure

SimonDaKappa and others added 15 commits February 23, 2024 15:08

Stash Squash Fix

dc11e5a

Fix: .env support for colmap worker

cb39f6d

Video size constraint made div by 16

dd4ffc6

Feat: initial nerf worker containerization

9d5614b

Beginning work for Blurry Video Detection

d2e647d

"Changes"

8761871

Adding TensoRF NOT as submodule

315bda1

Fixed consuming nerf-in typo

0ee84f6

Merge pull request #104 from SimonDaKappa/nerf-worker-connection

b1e9426

End To End connection

First attempt gpu support for nerf-worker

c45bc81

nerf-worker gpu docker image that is cpu tolerant. Works on cpu, need…

dc09ec5

… gpu to test

Merge pull request #106 from SimonDaKappa/nerf-worker-connection

b2b458c

nerf worker gpu docker image and cuda support

Manual merge with master'

ad24eb8

SimonDaKappa requested a review from rougejaw March 26, 2024 22:04

SimonDaKappa and others added 4 commits March 26, 2024 18:13

Fix: process_finished_sfms typo

53178de

Logging for COLMAP and Web-Server

19b1742

Initial Merge branch 'featurenerf-connection' of https://github.com/N…

2996260

…eRF-or-Nothing/vidtonerf

Logging for TensoRF and Correcting Merge

dc5069b

rougejaw requested changes Apr 5, 2024

View reviewed changes

AidanWestphal and others added 7 commits April 5, 2024 16:59

Bad Video Handling Fix

596117e

Bugfix Endpoints and Video Exception Handling

c6643e5

Change blockConnection to spawn a thread to run nerf pipeline and use…

d7a4fc0

… threadsafe ack/publish callbacks

Merge pull request #111 from NeRF-or-Nothing/nerf-pika-fix

9c5ac54

Change blockConnection to spawn a thread to run nerf pipeline and use…

Move thread join to ctrl-c capture

c397080

Merge from aidan logging changes

63cd9f5

Merge branch 'AidanWestphal-master' into featurenerf-connection

36e7721

Finalize Pika Thread Fix. Spawn nerf_worker process for gpu cuda capa…

1f4d7b8

…bilties. Add GPU supported up to CUDA ISA SM_86 (RTX 3000 Series). Merge Aidans logging

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

End-To-End TensoRF connection #108

End-To-End TensoRF connection #108

SimonDaKappa commented Mar 26, 2024

rougejaw left a comment

SimonDaKappa commented Apr 15, 2024

SimonDaKappa commented Apr 15, 2024

End-To-End TensoRF connection #108

Are you sure you want to change the base?

End-To-End TensoRF connection #108

Conversation

SimonDaKappa commented Mar 26, 2024

rougejaw left a comment

Choose a reason for hiding this comment

SimonDaKappa commented Apr 15, 2024

SimonDaKappa commented Apr 15, 2024