Skip to yearly menu bar Skip to main content


Poster

Removing Biases from Molecular Representations via Information Maximization

Chenyu Wang · Sharut Gupta · Caroline Uhler · Tommi Jaakkola

Halle B #179
[ ] [ Project Page ]
Fri 10 May 7:30 a.m. PDT — 9:30 a.m. PDT

Abstract:

High-throughput drug screening -- using cell imaging or gene expression measurements as readouts of drug effect -- is a critical tool in biotechnology to assess and understand the relationship between the chemical structure and biological activity of a drug. Since large-scale screens have to be divided into multiple experiments, a key difficulty is dealing with batch effects, which can introduce systematic errors and non-biological associations in the data. We propose InfoCORE, an Information maximization approach for COnfounder REmoval, to effectively deal with batch effects and obtain refined molecular representations. InfoCORE establishes a variational lower bound on the conditional mutual information of the latent representations given a batch identifier. It adaptively reweights samples to equalize their implied batch distribution. Extensive experiments on drug screening data reveal InfoCORE's superior performance in a multitude of tasks including molecular property prediction and molecule-phenotype retrieval. Additionally, we show results for how InfoCORE offers a versatile framework and resolves general distribution shifts and issues of data fairness by minimizing correlation with spurious features or removing sensitive attributes.

Live content is unavailable. Log in and register to view live content