BAYESIAN INVARIANCE ENVIRONMENT DATA
Abstract
Identifying invariant features – those that stably predict the outcome across diverse environments – is crucial for improving model generalization and uncovering causal mechanisms. While previous methods primarily address this problem through hypothesis testing or regularized optimization, they often lack a principled characterization of the underlying data generative process and struggle with high-dimensional data. In this work, we develop a Bayesian model that encodes an invariance assumption in the generative process of multi-environment data. Within this framework, we perform posterior inference to estimate the invariant features and establish theoretical guarantees on posterior consistency and contraction rates. To address the challenges in high-dimensional settings, we design a scalable variational inference algorithm. We demonstrate the superior inference accuracy and scalability of our method compared to existing approaches in simulations and a gene-perturbation study.