Top Animation
D2GS: Depth-and-Density Guided Gaussian Splatting for Stable and Accurate Sparse-View Reconstruction
Meixi Song1,2
Xin Lin3
Dizhe Zhang1
Haodong Li3
Xiangtai Li4
Bo Du5
Lu Qi1,5
1Insta360 Research
2Tsinghua University
3University of California San Diego
4Nanyang Technological University
5Wuhan University

CoR-GS

DropGaussian

Ours (D²GS)

TRELLIS Teaser
Recent advances in 3D Gaussian Splatting (3DGS) enable real-time, high-fidelity novel view synthesis (NVS) with explicit 3D representations. However, performance degradation and instability remain significant under sparse-view conditions. In this work, we identify two key failure modes under sparse-view conditions: overfitting in regions with excessive Gaussian density near the camera, and underfitting in distant areas with insufficient Gaussian coverage. To address these challenges, we propose a unified framework D2GS, comprising two key components: a Depth-and-Density Guided Dropout strategy that suppresses overfitting by adaptively masking redundant Gaussians based on density and depth, and a Distance-Aware Fidelity Enhancement module that improves reconstruction quality in under-fitted far-field areas through targeted supervision. Moreover, we introduce a new evaluation metric to quantify the stability of learned Gaussian distributions, providing insights into the robustness of the sparse-view 3DGS. Extensive experiments on multiple datasets demonstrate that our method significantly improves both visual quality and robustness under sparse view conditions. The source code and trained models will be made publicly available.
Qualitative Evaluation | LLFF dataset

Qualitative Comparison on LLFF dataset with 3-view and 6-view. Comparisons were conducted with 3DGS, CoR-GS, DropGaussian.

Qualitative Evaluation | MipNeRF360 dataset

Qualitative Comparison on MipNeRF360 dataset with 12-view. Comparisons were conducted with 3DGS, CoR-GS, DropGaussian.

Qualitative Comparison on MipNeRF360 dataset with 12-view
Qualitative Comparison on MipNeRF360 dataset with 12-view
Quantitative Evaluation | LLFF dataset

Performance comparisons of sparse-view synthesis on LLFF dataset.

Performance comparisons of sparse-view synthesis on LLFF dataset
Quantitative Evaluation | MipNeRF360 dataset

Performance comparisons of sparse-view synthesis on MipNeRF360 dataset.

Performance comparisons of sparse-view synthesis on MipNeRF360 dataset
Methodology

Pipeline of the method

The proposed D2GS mainly consists of two key components: a Depth-and-Density guided Dropout (DD-Drop) mechanism and Distance-Aware Fidelity Enhancement (DAFE), to improve the stability and spatial completeness of scene reconstruction under sparse-view settings. DD-Drop assigns each Gaussian a dropout score based on local density and camera distance, indicating regions prone to overfitting. High-scoring Gaussians would be dropped with a higher probability to suppress aliasing and improve rendering fidelity. In addition, DAFE avoids underfitting by boosting supervision in distant regions using depth priors.

Inter-Model Robustness | LLFF dataset

Repeated training using the same algorithm and configuration can produce results with considerable variance, leading to large discrepancies in rendering quality. This highlights the importance of quantifying the divergence among independently trained models under identical settings to assess model robustness. To this end, we propose Inter-Model Robustness (IMR), a novel metric specifically designed for 3DGS, grounded in the theory of 2-Wasserstein Distance and Optimal Transport (OT) over Gaussian point clouds.

Let ${G}_1, {G}_2, \ldots, {G}_n$ denote $n$ independently trained 3DGS models, where each model ${G}_i$ consists of $K_i$ Gaussian primitives:

$$ {G}_i = \{({m}_{i,j}, {s}_{i,j}, {q}_{i,j}, \alpha_{i,j}, {f}_{i,j})\}_{j=1}^{K_i}. $$

To enable robustness analysis, each model is abstracted as a Gaussian mixture distribution:

$$ G_i = \sum_{j=1}^{K_i} w_{i,j} \cdot N(m_{i,j}, \Sigma_{i,j}), \quad w_{i,j} = \frac{\alpha_{i,j}}{\sum_{k=1}^{K_i} \alpha_{i,k}}. $$

To quantify the difference between two such Gaussian mixtures, we employ 2-Wasserstein distance. For two Gaussian distributions $\mu_1 = {N}({m}_1, {\Sigma}_1)$ and $\mu_2 = {N}({m}_2, {\Sigma}_2)$, the Wasserstein distance admits a closed-form via the Bures metric:

$$ W_2^2(\mu_1, \mu_2) = \|m_1 - m_2\|^2 + \text{tr}(\Sigma_1 + \Sigma_2 - 2(\Sigma_2^\frac{1}{2} \Sigma_1 \Sigma_2^\frac{1}{2})^\frac{1}{2}). $$

To avoid expensive matrix square roots and improve numerical stability, we approximate the Bures shape term via a first-order Taylor expansion, resulting in following expression:

$$ \tilde W_2^{2}(\mu_1,\mu_2) =\| m_1- m_2\|^{2} +\frac14\,\operatorname{tr}\!\bigl((\Sigma_1-\Sigma_2)\Sigma_2^{-1}(\Sigma_1-\Sigma_2)\bigr) $$

Let ${G}_1$ and ${G}_2$ denote two 3DGS models. The corresponding mixture Wasserstein distance is then formulated as an OT problem over the Gaussian components:

$$ \mathrm{MW}_2^2(G_1, G_2) = \min_{\gamma \ge 0} \sum_{i=1}^{K_1} \sum_{j=1}^{K_2} \gamma_{ij} \tilde W_2^2(\mu_{1,i}, \mu_{2,j}), \quad \text{s.t.} \quad \sum_j \gamma_{ij} = w_{1,i}, \quad \sum_i \gamma_{ij} = w_{2,j}. $$

This formulation performs soft structure-aware alignment established by the optimal transport plan $\gamma \in {R}^{K_1 \times K_2}$, eliminating the need for explicit correspondence. To compute the distance at scale, we introduce entropic regularization and solve the relaxed problem using the Sinkhorn algorithm:

$$ \mathrm{MW}_{2,\varepsilon}^2(G_1, G_2) = \min_{\gamma} \sum_{i,j} \gamma_{ij} C_{ij} + \varepsilon \sum_{i,j} \gamma_{ij} \log \gamma_{ij}. $$

Let $S_{ij} = \text{MW}_2^2({G}_i, {G}_j)$ denote the pairwise distances between $N$ independently trained models. Finally, we define the Inter-model Robustness (IMR) metric as the logarithmic ratio of the second moment to the first moment of the pairwise distances:

$$ \text{IMR}= \ln\left(\frac{\sum_{1\leq i<j \leq N} S_{ij}^2}{\sum_{1\leq i<j\leq N} S_{ij}}\right) $$
D2GS: Depth-and-Density Guided Gaussian Splatting for Stable and Accurate Sparse-View Reconstruction