Thinking as Society: Multi-Social-Agent Self-Distillation for Multimodal Misinformation Detection
Abstract
Multimodal Misinformation Detection (MMD) in realistic, mixed-sourced scenarios must incorporate robust reasoning capabilities to handle the social complexity and diverse types of forgeries. While MLLM-based agents are increasingly used for MMD task due to their powerful reasoning abilities, they suffer from a critical trade-off: on one hand, single-agent methods provide only the limited, single-view analysis; on the other hand, multi-agent methods introduce high computational costs and significant optimization difficulties. To address this gap, we propose a novel Multi-Social-Agent Self-Distillation framework that internalizes collective social reasoning capabilities into a unified model. Our framework consists of two core stages: (1) we simulate multi-perspective judgments from a diverse society of MLLM agents and synthesize their collective feedback into high-quality Social Chain-of-Thought (SCoT) data; (2) Building on this, we propose the Social Correction Value-Driven Preference Optimization (SCPO), a new alignment algorithm that leverages the degree of social misjudgment as a verifiable signal to dynamically focus training on the most challenging samples. Extensive experiments on the challenging MFC-Bench and MMFakeBench benchmarks demonstrate the effectiveness of our framework. Our 7B Qwen2-VL-based model significantly outperforms various MLLM baselines, multi-agent methods, and even competes or surpasses proprietary models like GPT-4o and Claude, facilitating advanced multimodal misinformation reasoning and detection via thinking as society.