UniSHARP
Universal Sharp Monocular View Synthesis
Depth Teaser
UniSHARP performs monocular novel view synthesis across diverse camera types. Given a single image from a perspective, wide-FoV, fisheye, or panoramic camera, UniSHARP predicts a 3D Gaussian point cloud and renders high-quality novel views.
Panoramic Performance
Dynamic Comparison
Panoramic Camera
SHARP

Run inference on each of the six cubemap faces separately and stitch the outputs into one equirectangular panorama

Perspective Camera
Qualitative Evaluation
Perspective
Panoramic
Fisheye
Methodology
UniSHARP pipeline for universal-camera monocular novel view synthesis. Given a single source image, UniSHARP estimates ray-distance geometry and multi-scale features, initializes two-layer Gaussians in ray-distance space, predicts Feature Conditioned Gaussian residuals, and renders target views with the unified Gaussian representation across perspective, wide-FoV, fisheye, and panoramic cameras.
Benchmark and Dataset
Composition of the proposed field-of-view stratified benchmark for universal-camera monocular novel view synthesis. Validation pairs are grouped by effective FoV and projection type, and sample counts denote evaluated source-target pairs.
Details
Evaluation protocol for single-source monocular NVS. We restrict target views to locally reachable positions: source–target overlap > 60%, camera-center distance < 0.5 m, and image-index gap < 10. This focuses evaluation on geometry and disocclusion under meaningful motion rather than unconstrained long-range hallucination. The protocol is a unified testbed for universal-camera NVS, measuring how rendering quality scales across perspective, wide-FoV, fisheye, and 360° projections. We use a single-source, multi-target setting—the first frame of each sequence as the source and the next ten frames as targets.
OmniRooms
OmniRooms is a panoramic simulation dataset highly suitable for 3D reconstruction, especially for 3DGS tasks. It consists of 16 large indoor scenes (each containing multiple rooms), 300k RGB images, covering both small and large pose movements, along with corresponding depth information. OmniRooms is collected via AirSim, with OmniRooms-Wide derived by projecting these panoramas into 130° equidistant fisheye views. For each anchor point on a 0.5 m voxel grid, we render one central camera and 29 others randomly sampled within a local axis-aligned cube of edge length 30 cm around the source camera. To isolate translation-induced synthesis, all cameras share a fixed orientation. Each frame is rendered as a 1024 × 2048 ERP image.
16 Scenes (Example)
One Group (30 Images)
OmniRooms one group sample

Citation

        @article{song2026unisharp,
          title={UniSHARP: Universal Sharp Monocular View Synthesis},
          author={Song, Meixi and Zhang, Dizhe and Ren, Hao and Zhang, Ruiyang and Du, Bo and Yang, Ming-Hsuan and Qi, Lu},
          journal={arXiv},
          year={2026}
        }
        
UniSHARP: Universal Sharp Monocular View Synthesis