Skip to content

3) Nerf for large scenes and 3d mapping: aka Google live view and Apple Fly around

lightfield botanist edited this page Dec 16, 2022 · 33 revisions

Table of contents generated with markdown-toc

Bounded and Unbounded Neural Radiance Fields

Real forward-facing scenes and synthetic bounded scenes

Spatial distortions

If you are trying to reconstruct a scene or object from images, you may wish to consider adding a spatial distortion. When rendering a target view of a scene, the camera will emit a camera ray for each pixel and query the scene at points along this ray. We can choose where to query these points using different samplers.

These samplers have some notion of bounds that define where the ray should start and terminate. If you know that everything in your scenes exists within some predefined bounds (ie. a cube that a room fits in) then the sampler will properly sample the entire space. If however the scene is unbounded (ie. an outdoor scene) defining where to stop sampling is challenging. One option to increase the far sampling distance to a large value (ie. 1km). Alternatively we can warp the space into a fixed volume. Below are supported distortions.

Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields

https://jonbarron.info/mipnerf360/

Mega-NeRF: Scalable Construction of Large-Scale NeRFs for Virtual Fly-Throughs 2022

We explore how to leverage neural radiance fields (NeRFs) to build interactive 3D environments from large-scale visual captures spanning buildings or even multiple city blocks collected primarily from drone data. In contrast to the single object scenes against which NeRFs have been traditionally evaluated, this setting poses multiple challenges including (1) the need to incorporate thousands of images with varying lighting conditions, all of which capture only a small subset of the scene, (2) prohibitively high model capacity and ray sampling requirements beyond what can be naively trained on a single GPU, and (3) an arbitrarily large number of possible viewpoints that make it unfeasible to precompute all relevant information beforehand (as real-time NeRF renderers typically do). https://arxiv.org/abs/2112.10703

https://github.com/cmusatyalab/mega-nerf

Combining multiple Nerfs to a map

Building NeRF at City Scale, 2021

Instead of having different pictures a few centimeters apart this approach can handle have pictures from thousands of kilometers apart, ranging from satellites to pictures taken on the road. As you can see, NeRF alone fails to use such drastically different pictures to reconstruct the scenes. CityNeRF is capable of packing city-scale 3D scenes into a unified model, which preserves high-quality details across scales varying from satellite-level to ground-level.

Source:

First trains the neural network successively from distant viewpoints to close-up viewpoints -- and to train the neural network on transitions in between these "levels". This was inspired by "level of detail" systems currently in use by traditional 3D computer rendering systems. "Joint training on all scales results in blurry texture in close views and incomplete geometry in remote views. Separate training on each scale yields inconsistent geometries and textures between successive scales." So the system starts at the most distant level and incorporates more and more information from the next closer level as it progresses from level to level.

Modifies the neural network itself at each level by adding what they call a "block". A block has two separate information flows, one for the more distant and one for the more close up level being trained at that moment. It's designed in such a way that a set of information called "base" information is determined for the more distant level, and then "residual" information (in the form of colors and densities) that modifies the "base" and adds detail is calculated from there.

As current CityNeRF is built upon static scenes, it cannot handle inconsistency in the training data. We observed that, in Google Earth Studio [1], objects with slender geometry, such as a lightning rod, flicker as the camera pulls away. Artifacts like flickering moir patterns in the windows of skyscrapers, and differences in detail manifested as distinct square regions on the globe are also observed in the rendered images served as the ground truths2. Such defects lead to unstable rendering results around certain regions and bring about inconsistencies. A potential remedy is to treat it as a dynamic scene and associate each view with an appearance code that is jointly optimized as suggested in [6, 10]. Another potential limitation is on computation. The progressive strategy naturally takes longer training time, hence requires more computational resources.

Urban Radiance Fields

https://urban-radiance-fields.github.io

LIDAR Constrained NeRF on Outdoor Scenes

https://www.yigitarastunali.com/project/lidar-constrainted-nerf-on-outdoor-scenes/

Block-NeRF, 2022

A method that enables large-scale scene reconstruction by representing the environment using multiple compact NeRFs that each fit into memory. At inference time, Block-NeRF seamlessly combines renderings of the relevant NeRFs for the given area. In this example, we reconstruct the Alamo Square neighborhood in San Francisco using data collected over 3 months. Block-NeRF can update individual blocks of the environment without retraining on the entire scene, as demonstrated by the construction on the right.

Video results can be found on the project website waymo.com/research/block-nerf.

Google live view, 2022

Uses Pixelstreaming, since google and apple are actively competing who is first techical detauls on both solutions are sparse

Demo

Apple fly around, 2022

Instructions: https://support.apple.com/en-au/guide/iphone/iph81a3f978/ios

NeRFusion: Fusing Radiance Fields for Large-Scale Scene Reconstruction, 2022

https://arxiv.org/pdf/2203.11283.pdf

nerf compositing for avatars

https://www.unite.ai/creating-full-body-deepfakes-by-combining-multiple-nerfs/