Arbitrary-Shaped Image Generation via Spherical Neural Field Diffusion
Abstract
Existing diffusion models excel at generating diverse content, but remain confined to fixed image shapes and lack the ability to flexibly control spatial attributes such as viewpoint, field-of-view (FOV), and resolution. To fill this gap, we propose Arbitrary-Shaped Image Generation (ASIG), the first generative framework that enables precise spatial attribute control while supporting high-quality synthesis across diverse image shapes (e.g., perspective, panoramic, and fisheye). ASIG introduces two key innovations: (1) a mesh-based spherical latent diffusion to generate a complete scene representation, with seam enforcement denoising strategy to maintain semantic and spatial consistency across viewpoints; and (2) a spherical neural field to sample arbitrary regions from the scene representation with coordinate conditions, enabling distortion-free generation at flexible resolutions. To this end, ASIG enables precise control over spatial attributes within a unified framework, enabling high-quality generation across diverse image shapes. Experiments demonstrate clear improvements over prior methods specifically designed for individual shapes.