V3 and Gaussian Splatting #15

SimonDaKappa · 2024-09-09T23:40:51Z

Hi All, this is the partner PR to frontend, but its really really big, so I'm sorry.

Anyhow, heres the writeup, but its not nearly everything, and a lot is documented 1. in the code, 2. in the wiki, and 3. in this and the submodule repo issues.

Hey All! I said I'd work on this project over the summer, but I really underestimated how hard it is to get a job right now, so prepare for way too many changes for a single PR.

TLDR: My feature bloat had feature bloat.

Reasoning

Over the spring semester, we really tried to replace Tensorf with Gaussian splatting, and got demolished by nvidia errors for no reason. So, I ended up getting rid of the entire build system and transferring to docker only (which brought a bunch more pain). With gaussian splatting we get a bunch of shiny new features. Mainly, real time scene rendering (I get 150fps on my desktop and 60ish on the lower tier rpi laptop). This gave me the reason i needed to finally learn so frontend tech. Overall, I kept changing something and that led me to something else I could add.

I've tried my best to not override too much work and only make noncontroversial improvements, but there may be some toes I stepped on. If I did, you have my apologies for that.

Changes

With V3 comes a overall restructure of the web server.

Originally, I wrote the new API and features in the pythons server, but I wanted to learn GO, and it had a lot of really nice features for a web server (JSON/BSON auto marshal by tagging, easy concurrency, strong http frameworks, better MongoDB integration, and good error handling), so I eventually bit the bullet and tried to do basically a 1-to-1 translation from what I had in python to go.

Most of the services you would see in the old webserver are still here, except maybe slightly more decoupled. Refer to the Inline comments and go-web-server wiki for a more in-depth overview of changes. It is almost all composition by dependency injection (although some interfaces could be created to make it better)

controller.py -> WebServer.go: WebServer is a translation of the main http handler. In this version, almost all processing that is not http has been factored out into ClientService. It now has a lot more REST API endpoints and can handle JWT tokens for auth. Incoming Requests are now automatically validated and have constraints enforcement.
client_service.py -> ClientService.go: ClientService now does most of the actual REST API request handling heavy lifting. It delegates to the correct database managers and delegates new/finished jobs to the AMPQService. Request resources are now required to belong to a user when requested.
queue_service.py -> AMPQService.go: AMPQService is a object oriented version of the old thread based rabbitmqservice. It is now much more tolerant to connection issues and has the same publisher consumer concurrency from the old system.
scene.py -> New Schemas and Managers: There are now dedicated schema files and managers for each DB collection. In short:
1. Users now have robust password storage. Username and password can be changed.
2. Users now have a list of scenes that they own and have access to.
3. Each scene now has more output types
4. Scenes now have dynamic training configurations allowing users to have more control over what is trained and generated.
5. Queue lists have been more fully implemented, but maybe should be moved to just in memory.
6. UUID4s have been moved to MongoDB ObjectID's for performance and native support (also allows easy sorting by timestamp social media implementation wink wink)

API Changes

Overall, there has been an expansion of what external API endpoints are available, including progress reports, scene metadata, user endpoints, more output handles, previews, and more.
Also, requests now have more descriptive error messages and accurate http status codes.
See API Changes for more.

Worker Communication

Worker communication should be mostly the same as previously. However, there is now more information communicated between workers to support background colors for better training,
dynamic configs, and error tolerance.
See Worker Communication for more.

Worker Services

Since we are a microservice architecture, I wanted to actually have the micro part. All the Dockerfiles should be new and improved, with smaller dependencies, experimental BUILDKIT caching for much faster (second time onwards) image building, and multistage builds to reduce container size. This is also a reason to switch to GO, as the webserver container is like 25MB instead of 7GB so we can eventually have multiple instances running to be load-balanced.

Also, we were previously handling environment variables a little weirdly. Now, if building from backend docker compose strategy, the .env file should be in the project root, and all the relevant environment variables should be passed to the services. Just in case of local run, the services can still load their own .env files, but in most cases it will be unnecessary

The Elephant in the Room (Gaussian Splatting)

The root cause of this all is the new nerf-worker, which is essentially a wrapper around the inria Gaussian Splatting code (with some improvements). The nerf-worker now has two main services

FileServer: Handles output of all worker data on its own process
GaussianJobDispatcher: Handles incoming jobs, dynamic gpu prefetch/training (with round robin job assignment), output generation, and publishing back to the go-web-server

A user can now train a single scene for up to 30000 iterations (~25 Minutes on 3080ti), choose their output types, and save up to 5 selected training iterations for output. One thing to note is that .ply files have rather large file size (50-500MB), so unless we ever get the school to pay for a dedicated S3 or remoted storage, nerf-worker and go-web-server will eat at your file system.

You can see the Nerf Worker wiki for more detailed changes.

Service Removal / Submoduling

Originally I wanted to allow the dev to choose between the flask server and the go server, as well as tensoRF and Gaussian Splatting, and I may add that in the future. However, in my opinion go-web-server and Gaussian Splatting have such overwhelming advantages that I have removed those services in this version. That is a main reason as to why I created the frontend/backend repositories, so that I do not override a rich commit history for other services and versions.

I did try my best (with a lot of help from chatgpt) to filter all the vidtonerf commits by file so that I could take as much of the history from VidToNerf and web-app-react to backend/frontend, but I could never quite make it work. Instead, I've tried to transfer as many issues as possible from the old repositories and make updates to them all.

Each service is now its own repository and submoduled in the backend repo, allowing for more decoupling of version control, and allowing easier issue identification and personalized wikis. Also, if we ever add GitHub actions, it will be much easier to add to each individual repository.

Some Backend Notes

I have fleshed out a lot of the wiki pages, so please consult them if you have questions. Also, I fully expect nerf-worker to require finagling to get building by the team, and please please please reach out to me over discord for some help. The single largest time sink for this project was a month and a half where I tried to get gaussian splatting into docker, so I expect some issues. I did document a lot of my process locally and a little on the nerf-worker wiki, but there is guaranteed to be some pieces that I missed.

Frontend

Typescript

I have transferred all of web-app-react to typescript, to speed up development (and also because I needed it to make the real-time splat scene viewer)

Tech Stack

web-app-react was moved to Web-App-Vite and then frontend. Originally, this was a Create-React-App (CRA) template, but that comes with webpack as the bundler. This would normally be fine, but A. there exists much better ones, and B. Webpack hates Web Worker threads (which were an absolute must for the SplatCloudViewer), so I have moved the applicate to Vite. You can use both npm and yarn as your node package manager and be fine.

UI/UX

The original react app used react-bootstrap for the UI components, a perfectly fine choice. However, typescript has a few hiccups with resolving their types, and the red squigglies were annoying. Also, I am anything but good at UI, so I tried to switch to a more modern component library in Mantine, but the webpage now looks a tad bit soulless. There's a lot of a super cool libraries that came out this year like Shadcn, so if design is your forte, please please take the reigns and give the page some personality. Also the About page needs some love.

Improvements/Features

There was a lot added to the webpage

User functionality is more (but not completely) fleshed out. There is now full authentication and user scene histories.
Uploaded videos now allow training config selection and semi real time progress monitoring.
Output type support has been greatly expanded. Users can now do the traditional video view, but can also use the interactive Splat viewer to view the scenes from any camera and position in real time.
Users can now render their own .splat files they've gotten from other applications
Dark/Light Mode Support
Complete API Functionality
A general cleanup and better documentation

Missing Work

About page needs a redesign to reflect the new architecture
Community was never implemented, but no time like the present
User account management needs some love, like implementing a profile page to modify their username and password
Scene history should probably allow each scene preview card to have a trashcan in the top right with confirmation to delete the scene from the backend.

Testing Apologies

I always intended to get unit testing done, but from active development I have probably done over 5000 manual functional/system and unit tests, and just kept plugging along. I can vouch for a large functionality, but cannot guarantee it. Its a total douche-canoe move to leave this to the next team, so I will also be heavily working on testing in the near future.

Relevant PR's

Frontend PR

…e simon for wip docs

…raint is arbitrarily set atm

…dition to videos. Fix typos

- Added Support for initial post parameters. Includes training types ("gaussian", "tensorf"), output types ("video", "ply", "splat", "model") - Added initial user defined job configs stored in scene record (JobRecord). Contains sfm_config and nerf_config for corresponding workers - Added from_dict() method in scene.py for dataclasses

…c job config/dispatch

…s to frontend.

…ed with GoWebServer in near future

…nerf-worker. Colmap white-background detection support.

Add gaussian splatting, and fix typos.

Update guassian requirements, roadmap, and installation

Fix bold and update shields

SimonDaKappa · 2024-09-09T23:46:25Z

Also, I tried really hard to keep commit history for workers (but sfm is the only one that still exists in this version), but I couldn't get it to transfer to their repos (outside go-web-server which i started from scratch in). So, there wont be an diff data for the submodules and little commit history for them, but I tried to make it up with a lot of in code and wiki documentation

SimonDaKappa · 2024-09-21T18:56:14Z

Just gonna merge, as we are in indefinite hiatus until more rcos attention is gathered, and I'd like to add some more features on my own time.

SimonDaKappa and others added 27 commits June 11, 2024 12:10

Initial Multi-Stage gsplatting build tests

53e64cf

Trying to expose NVCC in docker build stage for submodule wheels

d4816ea

First Succesfull running gaussian splatting manually in container, se…

14d3407

…e simon for wip docs

Loosening SFM bluriness constraint and additional logging

6922c80

Fixing Loose Bluriness Constraint File Writing, Fix Typo. Note: Const…

dd2427d

…raint is arbitrarily set atm

Initial Gaussian Splatting Worker backend connection

a3beff4

Initial Working e2e connection

041520d

Update nerf scene represenation to handle splat files as output in ad…

762b198

…dition to videos. Fix typos

WIP SceneManager overhaul to remove redundant calls and enable dynami…

fccc834

…c job config/dispatch

Initial V3 Overhaul commit. Needs all testing and further improvement…

e16d050

…s to frontend.

Major work towards V3 Backend. Remove https from flask, will be unifi…

abea76b

…ed with GoWebServer in near future

Rename gaussian_splatting_reduced

cc823cd

Redo of service-separation in this brach. Output handling changes to …

6593814

…nerf-worker. Colmap white-background detection support.

attempt module fix

34865a1

attempt module fix2

73d2ef6

attempt module fix3

385af7a

attempt module fix3

b922e0c

Remove nerf-worker folder

f3cab99

Attempt fix nerf-worker submodule

321bed4

Moving sfm-worker to own repo and submodule

0277229

File restructure for documentation and helpful dev scripts

2d90aa7

Update README.md

3233fd0

Add gaussian splatting, and fix typos.

Update README.md

feaecbe

Update guassian requirements, roadmap, and installation

Update README.md

a009364

Fix bold and update shields

Remove flask webserver from this branch

c9a9441

update modules

d757026

SimonDaKappa self-assigned this Sep 9, 2024

SimonDaKappa requested review from PotatoPalooza and rougejaw September 9, 2024 23:41

SimonDaKappa added documentation Improvements or additions to documentation enhancement New feature or request labels Sep 9, 2024

SimonDaKappa changed the title ~~Gaussian integration~~ V3 and Gaussian Splatting Sep 9, 2024

update modules

74c4e46

SimonDaKappa merged commit 5c9c94b into main Sep 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

V3 and Gaussian Splatting #15

V3 and Gaussian Splatting #15

SimonDaKappa commented Sep 9, 2024 •

edited

Loading

SimonDaKappa commented Sep 9, 2024

SimonDaKappa commented Sep 21, 2024

V3 and Gaussian Splatting #15

V3 and Gaussian Splatting #15

Conversation

SimonDaKappa commented Sep 9, 2024 • edited Loading

Reasoning

Changes

With V3 comes a overall restructure of the web server.

API Changes

Worker Communication

Worker Services

The Elephant in the Room (Gaussian Splatting)

Service Removal / Submoduling

Some Backend Notes

Frontend

Typescript

Tech Stack

UI/UX

Improvements/Features

Missing Work

Testing Apologies

Relevant PR's

SimonDaKappa commented Sep 9, 2024

SimonDaKappa commented Sep 21, 2024

SimonDaKappa commented Sep 9, 2024 •

edited

Loading