QPrompt-R1: Real-Time Reasoning for Domain-Generalized Semantic Segmentation via Group-Relative Query Alignment
Abstract
Deploying semantic segmentation in driving and robotics requires both real-time inference and robustness to domain shifts, formalized as Real-Time Domain-Generalized Semantic Segmentation (RT-DGSS), which has not been fully addressed. Existing methods often treat real-time(RT) inference and domain generalization (DG) separately, with DG improving robustness but lacking real-time performance, and real-time models being brittle under distribution shifts. To address the RT-DGSS problem, we propose QPrompt-R1, a real-time Query-Prompt architecture built on a ViT backbone. QPrompt-R1 injects a small set of learnable queries only at the final transformer block, performing a single query–image alignment step and eliminating decoder overhead. To further enhance alignment without test-time cost, we introduce a Group Relative Query Alignment (GRQA) objective, which uses group-relative supervision within each group to align queries with features, improving domain generalization through group-relative rewards. QPrompt-R1 achieves 54 FPS, delivering strong performance in synthetic-to-real transfer, real-to-real generalization, and robustness under adverse conditions. Additionally, GRQA is plug-and-play, improving state-of-the-art DGSS methods like REIN (+1.2) and SoMA (+0.5) without inference-time overhead.