Poster
in
Workshop: Deep Generative Model in Machine Learning: Theory, Principle and Efficacy
A Theory for Conditional Generative Modeling on Multiple Data Sources
Rongzhen Wang · Yan Zhang · Chenyu Zheng · Chongxuan Li · Guoqiang Wu
Keywords: [ distribution estimation ] [ multiple data sources ] [ generative mode ] [ MLE ]
The success of large generative models has driven a paradigm shift, leveraging massive multi-source data to enhance model capabilities. However, the interaction among these sources remains theoretically underexplored. This paper takes a first step toward a rigorous analysis of multi-source training in conditional generative modeling, where each condition represents a distinct data source. Specifically, we establish a general distribution estimation error bound in average total variation distance for conditional maximum likelihood estimation (MLE) based on the bracketing number.Our result shows that when source distributions share similarity and the model is sufficiently expressive, multi-source training guarantees a sharper bound than single-source training.We further instantiate the general theory on conditional Gaussian estimation as an illustrative example.The result highlights that the number of sources and similarity among source distributions improve the advantage of multi-source training. Simulations and real-world experiments validate our findings.We hope this work inspires further theoretical understandings of multi-source training in generative modeling.