Skip to yearly menu bar Skip to main content


Virtual presentation / poster accept

Synthetic Data Generation of Many-to-Many Datasets via Random Graph Generation

Kai Xu · Georgi Ganev · Emile Joubert · Rees Davison · Olivier Van Acker · Luke Robinson

Keywords: [ Generative models ] [ differential privacy ] [ random graph generation ] [ synthetic data generation ]


Abstract: Synthetic data generation (SDG) has become a popular approach to release private datasets.In SDG, a generative model is fitted on the private real data, and samples drawn from the model are released as the protected synthetic data.While real-world datasets usually consist of multiple tables with potential \emph{many-to-many} relationships (i.e.~\emph{many-to-many datasets}), recent research in SDG mostly focuses on modeling tables \emph{independently} or only considers generating datasets with special cases of many-to-many relationships such as \emph{one-to-many}.In this paper, we first study challenges of building faithful generative models for many-to-many datasets, identifying limitations of existing methods.We then present a novel factorization for many-to-many generative models, which leads to a scalable generation framework by combining recent results from random graph theory and representation learning.Finally, we extend the framework to establish the notion of $(\epsilon,\delta)$-differential privacy.Through a real-world dataset, we demonstrate that our method can generate synthetic datasets while preserving information within and across tables better than its closest competitor.

Chat is not available.