Preserving Forgery Artifacts: AI-Generated Video Detection at Native Scale
Abstract
The rapid advancement of video generation models has enabled the creation of highly realistic synthetic media, raising significant societal concerns regarding the spread of misinformation. However, current detection methods suffer from critical limitations. They often rely on preprocessing operations like fixed-resolution resizing and cropping, which not only discard subtle, high-frequency forgery artifacts but can also cause distortion and significant information loss. Furthermore, these methods are frequently trained and evaluated on outdated datasets that fail to capture the sophistication of modern generative models. To address these challenges, we introduce two key contributions: a new large-scale dataset and benchmark, as well as a novel detection framework. We present a comprehensive dataset of over 140K videos from 15 state-of-the-art open-source and leading commercial generators. Specifically, we curate the Magic Videos Testset, featuring ultra-realistic videos produced by six latest generators through a meticulous generation and filtering pipeline. In addition, we propose a novel detection framework built on the Qwen2.5-VL Vision Transformer, which processes videos at their native spatial resolution and temporal duration. This native-scale approach preserves high-frequency details and spatiotemporal inconsistencies that are often lost during conventional preprocessing. Extensive experiments show that our method achieves state-of-the-art performance across multiple benchmarks. Our work underscores the importance of native-scale processing and establishes a robust new baseline for AI-generated video detection.