Skip to content

Latest commit

 

History

History
279 lines (220 loc) · 10.9 KB

README.md

File metadata and controls

279 lines (220 loc) · 10.9 KB

NeRF: Neural Radiance Field

Efficient and comprehensive pytorch implementation of NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis from Mildenhall et al. 2020.

hotdog lego chair drums mic materials ficus ship

Table of Content

Installation

This implementation has been tested on Ubuntu 20.04 with Python 3.8, and torch 1.9. Install required package first pip3 install -r requirements.txt. You may use pyenv or conda to avoid confilcts with your environement.

Download the Blender Scenes Dataset. Rename it and place it in the repo as data/blender (ingored by default).

data/
└── blender
    ├── chair
    ├── drums
    ├── ficus
    ├── hotdog
    ├── lego
    ├── materials
    ├── mic
    └── ship

Quickstart

Command Line

Action Command
Train python3 -m nerf.train
Inference python3 -m nerf.infer
Distillation python3 -m nerf.distill
Benchmark python3 -m nerf.bench

Reproduction

Action Command
Train make train
Distillation make distill
Hybrid make hybrid
Benchmark make bench_all

Manual

# ==== Imports
import nerf.infer         # Enables inference features (NeRF.infer)
import nerf.train         # Enables training features (NeRF.fit)

from nerf.core import BoundedVolumeRaymarcher as BVR, NeRF
from nerf.core import PositionalEncoding as PE
from nerf.core import NeRFScheduler
from nerf.data import BlenderDataset


DEVICE = "cuda:0"

# ==== Setup
dataset = BlenderDataset("./data/blender", scene="hotdog", split="train")

phi_x = PE(3, 6)
phi_d = PE(3, 6)

nerf = NeRF(phi_x, phi_d, width=256, depth=4).to(DEVICE)
raymarcher = BVR(tn=2., tf=6., samples_c=64, samples_f=64)

# ==== Train
history = nerf.fit(
    nerf,                 # NeRF Module
    raymarcher,           # Raymarcher (BVR)
    optim,                # Optimizer (Adam, AdamW, ...)
    scheduler,            # NeRFScheduler
    criterion,            # Criterion (MSELoss, L1Loss, ...)
    scaler,               # GradScaler (torch.cuda.amp, can be disabled)
    dataset: Dataset,     # Dataset (BlenderDataset)
)                         # More options available (epochs, batch_size, ...)

# ==== Infer
frame = nerf.infer(
    coarse,               # coarse NeRF Module
    fine,                 # fine NeRF Module
    raymarcher,           # Raymarcher (BVR)
    ro,                   # Rays Origin (Tensor of size (B, 3))
    rd,                   # Rays Direction (Tensor of size (B, 3))
    W,                    # Frame Width
    H,                    # Frame Height
)                         # More options available (epochs, batch_size, ...)

Description

NeRF uses both advances in Computer Graphics and Deep Learning research.

The method allows encoding a 3D scene as a continuous volume described by density and color at any point in a given bounded volume. During raymarching, the rays query the volume representation model to obtain intersection data. It is trained in an end-to-end fashion and uses only the ground truth images as an objective signal. A first network, the coarse model, is trained using voxel grid sampling to increase sample efficiency. This first pass is used to trained a second network, the fine network, using importance sampling of the volume.

The networks are tied to one unique scene. Caching and acceleration structures can be used to decrease rendering time during inference. The same models can be used to generate a depth map and a 3D mesh of the scene.

Positional Encoding

Fourier Features In their original work, Midenhall et al. presented the use of positional encoding to allow the network to learn high-frequency functions which clasical multilayer perceptron without positiona encoding are not able to and focus only on low-frequency reconstruction.

v = xy | xyz                      # normalized to [-1; 1]

rgb = lambda v: mlp(v)            # wo/ pe-encoding
rgb = lambda v: mlp(phi(v))       # w/  pe-encoding

phi = lambda v: [
  cos(2 ** 0 * PI * v),
  sin(2 ** 0 * PI * v),
  cos(2 ** 1 * PI * v),
  sin(2 ** 1 * PI * v),
  ...
].T

Fourier Features In Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains, Tancik et al 2020, NeRF authors have shown that encoding positions using fourier feature mapping enables multilayer perceptron to learn high-frequency functions in low dimensional problem domains.

v = xy | xyz                      # normalized to [-1; 1]

rgb = lambda v: mlp(v)            # wo/ ff-encoding
rgb = lambda v: mlp(phi(v))       # w/  ff-encoding

phi = lambda v: [
  a_0 * cos(2 * PI * b_0.T * v),
  a_0 * sin(2 * PI * b_0.T * v),
  a_1 * cos(2 * PI * b_1.T * v),
  a_1 * sin(2 * PI * b_1.T * v),
  ...
].T

Implicit Representation

The scene is encoded by feating a simple multilayer perceptron architecture on density sigma and color RGB given position x and direction d queries.

Original Architecture

n = 4

           ReLU    ReLU    
phi(x) --> 256 --> 256 --> ReLU(sigma)
  60    |   n   ^   n  |
        |       |      |           ReLU
        -- cat --      --> 256 --> 128 --> Sigmoid(RGB)
                                ^
                                |
                               cat
                                |
                              phi(d)
                                24

Volume Rendering

Volume raymarching is used to produce the final rendering. Each ray is thrown from the camera origin to each pixel and sampled N_c times for the coarse model and N_f times for the fine model between a given bounded volume delimited by the near t_n and far t_f camera frustum parameters.

Rendering Equation

N_c, N_f = 64, 128

alpha_i = (1 - exp(-sigma_i * delta_i))
T_i = cumprod(1 - alpha_i)
w_i = T_i * alpha_i
C_c = sum(w_i * c_i)

In this equation, w_i respresents a piecewise-constant PDF along the ray, T_i the amount of light blocked before reaching segment t_i, delta_i the segment length dist(t_i-1, t_i), and c_i the color of the ray intersection at t_i.

The weights w_i are reused for inverse transform sampling for the fine pass. A total of N_c + N_f is finally used to generate the last render, this time querying the coarse model instead.

Implementation

Details

Feature Reference
Fourier Featrure Encoding
Positional Encoding
Neural Radiance Field Model
Bounded Volume Raymarcher
Noise for Continuous Representation
Camera Paths (Turnaround, ...)
Interactive Notebook
Reptile Meta-Learning Tanick et al., Nichol et al.
Shifted Softplus for Sigma Barron et al.
Widened Sigmoid for RGB Barron et al.
Fine Network (Differs from Original, No second Network)
Training Opitmizations Nvidia's PyTorch Performance Tuning Guide
Safe Sofplus, Sigmoid Blog Article by Jia Fu Low
Gradient Clipping
NeRF/JAX-NeRF Warmup Decay Leanring Rate Scheduler Barron et al.
Log Decay Leanring Rate Scheduler

Results

Scene Ground Truth NeRF RGB Map NeRF Depth Map
Chair chair_gt chair_rgb_map chair_depth_map
Lego lego_gt lego_rgb_map lego_depth_map
HotDog hotdog_gt hotdog_rgb_map hotdog_depth_map
Drums drums_gt drums_rgb_map drums_depth_map
Mic mic_gt mic_rgb_map mic_depth_map
Materials materials_gt mic_rgb_map materials_depth_map
Ficus ficus_gt ficus_rgb_map ficus_depth_map
Ship ship_gt ship_rgb_map ship_depth_map

Using 64 Coarse Samples, 64 Fine Samples at 400x400 Resolution

Coarse Fine Seconds FPS RGB Map Depth Map
NeRF NeRF 1.91 0.52 vanilla_rgb vanilla_depth
DistillNeRF NeRF 1.37 0.73 hybrid_rgb hybrid_depth
DistillNeRF DistillNeRF 0.30 3.36 distill_rgb distill_depth

Citation

Original Work

@inproceedings{mildenhall2020nerf,
  title={NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis},
  author={Ben Mildenhall and Pratul P. Srinivasan and Matthew Tancik and Jonathan T. Barron and Ravi Ramamoorthi and Ren Ng},
  year={2020},
  booktitle={ECCV},
}