Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V3 and Gaussian Splatting #15

Merged
merged 28 commits into from
Sep 21, 2024
Merged

V3 and Gaussian Splatting #15

merged 28 commits into from
Sep 21, 2024

Conversation

SimonDaKappa
Copy link
Collaborator

@SimonDaKappa SimonDaKappa commented Sep 9, 2024

Hi All, this is the partner PR to frontend, but its really really big, so I'm sorry.

Anyhow, heres the writeup, but its not nearly everything, and a lot is documented 1. in the code, 2. in the wiki, and 3. in this and the submodule repo issues.

Hey All! I said I'd work on this project over the summer, but I really underestimated how hard it is to get a job right now, so prepare for way too many changes for a single PR.

TLDR: My feature bloat had feature bloat.

Reasoning

Over the spring semester, we really tried to replace Tensorf with Gaussian splatting, and got demolished by nvidia errors for no reason. So, I ended up getting rid of the entire build system and transferring to docker only (which brought a bunch more pain). With gaussian splatting we get a bunch of shiny new features. Mainly, real time scene rendering (I get 150fps on my desktop and 60ish on the lower tier rpi laptop). This gave me the reason i needed to finally learn so frontend tech. Overall, I kept changing something and that led me to something else I could add.

I've tried my best to not override too much work and only make noncontroversial improvements, but there may be some toes I stepped on. If I did, you have my apologies for that.

Changes

With V3 comes a overall restructure of the web server.

Originally, I wrote the new API and features in the pythons server, but I wanted to learn GO, and it had a lot of really nice features for a web server (JSON/BSON auto marshal by tagging, easy concurrency, strong http frameworks, better MongoDB integration, and good error handling), so I eventually bit the bullet and tried to do basically a 1-to-1 translation from what I had in python to go.

Most of the services you would see in the old webserver are still here, except maybe slightly more decoupled. Refer to the Inline comments and go-web-server wiki for a more in-depth overview of changes. It is almost all composition by dependency injection (although some interfaces could be created to make it better)

  • controller.py -> WebServer.go: WebServer is a translation of the main http handler. In this version, almost all processing that is not http has been factored out into ClientService. It now has a lot more REST API endpoints and can handle JWT tokens for auth. Incoming Requests are now automatically validated and have constraints enforcement.
  • client_service.py -> ClientService.go: ClientService now does most of the actual REST API request handling heavy lifting. It delegates to the correct database managers and delegates new/finished jobs to the AMPQService. Request resources are now required to belong to a user when requested.
  • queue_service.py -> AMPQService.go: AMPQService is a object oriented version of the old thread based rabbitmqservice. It is now much more tolerant to connection issues and has the same publisher consumer concurrency from the old system.
  • scene.py -> New Schemas and Managers: There are now dedicated schema files and managers for each DB collection. In short:
    1. Users now have robust password storage. Username and password can be changed.
    2. Users now have a list of scenes that they own and have access to.
    3. Each scene now has more output types
    4. Scenes now have dynamic training configurations allowing users to have more control over what is trained and generated.
    5. Queue lists have been more fully implemented, but maybe should be moved to just in memory.
    6. UUID4s have been moved to MongoDB ObjectID's for performance and native support (also allows easy sorting by timestamp social media implementation wink wink)

API Changes

Overall, there has been an expansion of what external API endpoints are available, including progress reports, scene metadata, user endpoints, more output handles, previews, and more.
Also, requests now have more descriptive error messages and accurate http status codes.
See API Changes for more.

Worker Communication

Worker communication should be mostly the same as previously. However, there is now more information communicated between workers to support background colors for better training,
dynamic configs, and error tolerance.
See Worker Communication for more.

Worker Services

Since we are a microservice architecture, I wanted to actually have the micro part. All the Dockerfiles should be new and improved, with smaller dependencies, experimental BUILDKIT caching for much faster (second time onwards) image building, and multistage builds to reduce container size. This is also a reason to switch to GO, as the webserver container is like 25MB instead of 7GB so we can eventually have multiple instances running to be load-balanced.

Also, we were previously handling environment variables a little weirdly. Now, if building from backend docker compose strategy, the .env file should be in the project root, and all the relevant environment variables should be passed to the services. Just in case of local run, the services can still load their own .env files, but in most cases it will be unnecessary

The Elephant in the Room (Gaussian Splatting)

The root cause of this all is the new nerf-worker, which is essentially a wrapper around the inria Gaussian Splatting code (with some improvements). The nerf-worker now has two main services

  • FileServer: Handles output of all worker data on its own process
  • GaussianJobDispatcher: Handles incoming jobs, dynamic gpu prefetch/training (with round robin job assignment), output generation, and publishing back to the go-web-server

A user can now train a single scene for up to 30000 iterations (~25 Minutes on 3080ti), choose their output types, and save up to 5 selected training iterations for output. One thing to note is that .ply files have rather large file size (50-500MB), so unless we ever get the school to pay for a dedicated S3 or remoted storage, nerf-worker and go-web-server will eat at your file system.

You can see the Nerf Worker wiki for more detailed changes.

Service Removal / Submoduling

Originally I wanted to allow the dev to choose between the flask server and the go server, as well as tensoRF and Gaussian Splatting, and I may add that in the future. However, in my opinion go-web-server and Gaussian Splatting have such overwhelming advantages that I have removed those services in this version. That is a main reason as to why I created the frontend/backend repositories, so that I do not override a rich commit history for other services and versions.

I did try my best (with a lot of help from chatgpt) to filter all the vidtonerf commits by file so that I could take as much of the history from VidToNerf and web-app-react to backend/frontend, but I could never quite make it work. Instead, I've tried to transfer as many issues as possible from the old repositories and make updates to them all.

Each service is now its own repository and submoduled in the backend repo, allowing for more decoupling of version control, and allowing easier issue identification and personalized wikis. Also, if we ever add GitHub actions, it will be much easier to add to each individual repository.

Some Backend Notes

I have fleshed out a lot of the wiki pages, so please consult them if you have questions. Also, I fully expect nerf-worker to require finagling to get building by the team, and please please please reach out to me over discord for some help. The single largest time sink for this project was a month and a half where I tried to get gaussian splatting into docker, so I expect some issues. I did document a lot of my process locally and a little on the nerf-worker wiki, but there is guaranteed to be some pieces that I missed.

Frontend

Typescript

I have transferred all of web-app-react to typescript, to speed up development (and also because I needed it to make the real-time splat scene viewer)

Tech Stack

web-app-react was moved to Web-App-Vite and then frontend. Originally, this was a Create-React-App (CRA) template, but that comes with webpack as the bundler. This would normally be fine, but A. there exists much better ones, and B. Webpack hates Web Worker threads (which were an absolute must for the SplatCloudViewer), so I have moved the applicate to Vite. You can use both npm and yarn as your node package manager and be fine.

UI/UX

The original react app used react-bootstrap for the UI components, a perfectly fine choice. However, typescript has a few hiccups with resolving their types, and the red squigglies were annoying. Also, I am anything but good at UI, so I tried to switch to a more modern component library in Mantine, but the webpage now looks a tad bit soulless. There's a lot of a super cool libraries that came out this year like Shadcn, so if design is your forte, please please take the reigns and give the page some personality. Also the About page needs some love.

Improvements/Features

There was a lot added to the webpage

  1. User functionality is more (but not completely) fleshed out. There is now full authentication and user scene histories.
  2. Uploaded videos now allow training config selection and semi real time progress monitoring.
  3. Output type support has been greatly expanded. Users can now do the traditional video view, but can also use the interactive Splat viewer to view the scenes from any camera and position in real time.
  4. Users can now render their own .splat files they've gotten from other applications
  5. Dark/Light Mode Support
  6. Complete API Functionality
  7. A general cleanup and better documentation

Missing Work

  1. About page needs a redesign to reflect the new architecture
  2. Community was never implemented, but no time like the present
  3. User account management needs some love, like implementing a profile page to modify their username and password
  4. Scene history should probably allow each scene preview card to have a trashcan in the top right with confirmation to delete the scene from the backend.

Testing Apologies

I always intended to get unit testing done, but from active development I have probably done over 5000 manual functional/system and unit tests, and just kept plugging along. I can vouch for a large functionality, but cannot guarantee it. Its a total douche-canoe move to leave this to the next team, so I will also be heavily working on testing in the near future.

Relevant PR's

Frontend PR

SimonDaKappa and others added 27 commits June 11, 2024 12:10
 - Added Support for initial post parameters. Includes training types ("gaussian", "tensorf"), output types ("video", "ply", "splat", "model")
 - Added initial user defined job configs stored in scene record (JobRecord). Contains sfm_config and nerf_config for corresponding workers
 - Added from_dict() method in scene.py for dataclasses
…nerf-worker.

Colmap white-background detection support.
Add gaussian splatting, and fix typos.
Update guassian requirements, roadmap, and installation
Fix bold and update shields
@SimonDaKappa
Copy link
Collaborator Author

Also, I tried really hard to keep commit history for workers (but sfm is the only one that still exists in this version), but I couldn't get it to transfer to their repos (outside go-web-server which i started from scratch in). So, there wont be an diff data for the submodules and little commit history for them, but I tried to make it up with a lot of in code and wiki documentation

@SimonDaKappa
Copy link
Collaborator Author

Just gonna merge, as we are in indefinite hiatus until more rcos attention is gathered, and I'd like to add some more features on my own time.

@SimonDaKappa SimonDaKappa merged commit 5c9c94b into main Sep 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
1 participant