SparseGS: Real-Time 360° Sparse View Synthesis using Gaussian Splatting

University of California, Los Angeles

Abstract

The problem of novel view synthesis has grown significantly in popularity recently with the introduction of Neural Radiance Fields (NeRFs) and other implicit scene representation methods. A recent advance, 3D Gaussian Splatting (3DGS), leverages an explicit representation to achieve real-time rendering with high-quality results. However, 3DGS still requires an abundance of training views to generate a coherent scene representation. In few shot settings, similar to NeRF, 3DGS tends to overfit to training views, causing background collapse and excessive floaters, especially as the number of training views are reduced. We propose a method to enable training coherent 3DGS-based radiance fields of 360 scenes from sparse training views. We find that using naive depth priors is not sufficient and integrate depth priors with generative and explicit constraints to reduce background collapse, remove floaters, and enhance consistency from unseen viewpoints. Experiments show that our method outperforms base 3DGS by up to 30.5% and NeRF-based methods by up to 15.6% in LPIPS on the MipNeRF-360 dataset with substantially less training and inference cost.

Ground Truth SparseNeRF MipNeRF360 3DGS Ours
From left to right: GT, SparseNeRF, MipNeRF360, 3DGS, SparseGS

Background

Novel view synthesis has grown significantly in popularity recently thanks to the introduction of implicit 3D scene represenations such as NeRF. These techniques enable many downstream applications in fields ranging from robotics to entertainment. However, most of these techniques are limited by the number of training views required: many need up to 200 views which can be prohibitively high for real-world usage. Prior work has tackled this problem with sparse view techniques, but these almost entirely focus on forward-facing scenes and suffer from long training and inference times of NeRFs.

Objective

We introduce a technique for real-time 360 sparse view synthesis by leveraging 3D Gaussian Splatting. The explicit nature of our scene representations allows to reduce sparse view artifacts with techniques that directly operate on the scene representation in an adaptive manner. Combined with depth based constraints, we are able to render high-quality novel views and depth maps for unbounded scenes.

Pipeline
Our proposed pipeline integrates depth and diffusion constraints, along with a floater pruning technique, to enhance the performance of few-shot novel view synthesis. During training, we render the alpha-blended depth, denoted as dalpha , and employ Pearson correlation to ensure its alignment with the monocularly estimated depth dpt. Furthermore, we impose a score distillation sampling loss on novel viewpoints to guarantee the generation of naturally-appearing images. At predetermined intervals, we execute floater pruning as described in Section 3 of our paper. In this illustration, new components that we introduce are highlighted in color, while the foundational 3D Gaussian Splatting pipeline is depicted in grey.

Key Ideas

  1. Leverage the explicit gaussian representation to directly remove unwanted sparse view artifacts such as “floaters” and “background collapse”
  2. Use off-the-shelf depth estimation models to regularize novel view outputs
  3. Reconstruct regions with low coverage in training views with diffusion-model guidance

Floater Removal Example

More Pictures

Ground Truth mipnerf ours Ground Truth mipnerf ours Ground Truth mipnerf ours Ground Truth mipnerf ours
From left to right: Base 3DGS, MipNeRF360, SparseGS (Ours)

More Videos (Updating)

Table: Some sparse-view NeRF baseline comparisons on the MipNeRF360 dataset

Model PSNR ↑ SSIM ↑ LPIPS ↓ Runtime* (h) Render FPS
SparseNeRF 11.5638 0.3206 0.6984 4 1/120
Base 3DGS 15.3840 0.4415 0.5061 0.5 30
Mip-NeRF 360 17.1044 0.4660 0.5750 3 1/120
RegNeRF 11.7379 0.2266 0.6892 4 1/120
ViP-NeRF 11.1622 0.2291 0.7132 4 1/120
Ours 16.6898 0.4899 0.4849 0.75 30

We use 12 images for each scene. * Runtimes are recorded on one RTX3090.

Citation

@article{xiong2023sparsegs,
  author    = {Xiong, Haolin and Muttukuru, Sairisheek and Upadhyay, Rishi and Chari, Pradyumna and Kadambi, Achuta},
  title     = {SparseGS: Real-Time 360° Sparse View Synthesis using Gaussian Splatting},
  journal   = {Arxiv},
  year      = {2023},
}