HybridNeRF: Efficient Neural Rendering via Adaptive Volumetric Surfaces

CVPR 2024 (Highlight)

Haithem Turki1,2 Vasu Agrawal1 Samuel Rota Bulò1 Lorenzo Porzi1
Peter Kontschieder1 Deva Ramanan2 Michael Zollhöfer1 Christian Richardt1

Meta Reality Labs Carnegie Mellon University

Abstract

Neural radiance fields provide state-of-the-art view synthesis quality but tend to be slow to render. One reason is that they make use of volume rendering, thus requiring many samples (and model queries) per ray at render time. Although this representation is flexible and easy to optimize, most real-world objects can be modeled more efficiently with surfaces instead of volumes, requiring far fewer samples per ray. This observation has spurred considerable progress in surface representations such as signed distance functions, but these may struggle to model semi-opaque and thin structures. We propose a method, HybridNeRF, that leverages the strengths of both representations by rendering most objects as surfaces while modeling the (typically) small fraction of challenging regions volumetrically. We evaluate HybridNeRF against the challenging Eyeful Tower dataset along with other commonly used view synthesis datasets. When comparing to state-of-the-art baselines, including recent rasterization-based approaches, we improve error rates by 15–30% while achieving real-time framerates (at least 36 FPS) for virtual- reality resolutions (2K×2K).

Overview

Eyeful Tower

We render novel views from the Workshop scene in the Eyeful Tower dataset. Since we train with HDR images, we are able to render the scene at different exposures.

Eyeful Tower Comparisons

We compare HybridNeRF to prior work (FPS shown for 2Kx2K rendering on a single Nvidia 4090 GPU). MERF, 3DGS, and VolSDF* (our implementation using iNGP accleration primitives) do not support training models in HDR and therefore do not support exposure adjustments.
Our quality is slightly better than VR-NeRF while rendering over 10x faster. Our results are significantly better than faster rendering approaches, especially when modeling specular effects.

ScanNet++ Comparisons

We evaluate ScanNet++ as another dataset built from high-resolution captures of indoor scenes. 3DGS struggles with reflections and far-field content. Our method performs the best overall while exceeding the 36 FPS target frame rate for VR.

MipNeRF 360 Comparisons

We also evaluate HybridNeRF against the Mip-NeRF 360 dataset, where our approach performs comparably to other state of the art methods. We render novel camera trajectories that show how volume-based methods such as iNGP "cheat" when modeling apparent surfaces. Our method renders faster and generates more plausible surface geometry.

Citation

@InProceedings{turki2024hybridnerf,
      title={HybridNeRF: Efficient Neural Rendering via Adaptive Volumetric Surfaces}, 
      author={Haithem Turki and Vasu Agrawal and Samuel Rota Bulò and Lorenzo Porzi and Peter Kontschieder and Deva Ramanan and Michael Zollh\"{o}fer and Christian Richardt},
      booktitle = {Computer Vision and Pattern Recognition (CVPR)},
      year={2024}
}