PIRN: Prototypical-based Intra-modal Reconstruction with Normality Communication for Multi-modal Anomaly Detection.
Abstract
Unsupervised Multimodal anomaly detection (MAD) — identifying defects by jointly analyzing RGB images and 3D data — is crucial for quality control in manufacturing. However, existing MAD methods struggle when only a few normal samples are available. Cross-modal alignment models fail to learn stable correspondences with scarce training data, and memory-based approaches misclassify any unseen normal variation as anomalous.To addresses the few-shot challenge, we introduce PIRN (Prototypical-based Intra-modal Reconstruction with Normality Communication for Multi-modal Anomaly Detection.), a prototype-based intra-modal reconstruction framework with explicit cross-modal knowledge transfer. PRINC features three key innovations: (1) Balanced Prototype Assignment (BPA) formulates token‑to‑prototype routing as a balanced optimal‑transport problem, guaranteeing uniform utilisation of all prototypes and preventing codebook collapse.(2) Adaptive Prototype Refinement (APR) treats prototypes as adaptive memory and updates them on‑the‑fly with gated GRU cells driven by optimally‑matched image context, expanding coverage to unseen yet normal variations while suppressing anomalies. (3) Multi‑modal Normality Communication (MNC) exchanges complementary normal cues across modalities via gated cross‑attention. MNC enables one modality to reconstruct its feature map not only from its own prototypes, but also with high-level normal patterns provided by the other modality.Extensive experiments on standard benchmarks demonstrate that PIRN significantly outperforms prior methods, achieving new state-of-the-art results, especially in challenging few-shot scenarios.