Skip to yearly menu bar Skip to main content


Poster

Protein Discovery with Discrete Walk-Jump Sampling

Nathan Frey · Dan Berenberg · Karina Zadorozhny · Joseph Kleinhenz · Julien Lafrance-Vanasse · Isidro Hotzel · Yan Wu · Stephen Ra · Richard Bonneau · Kyunghyun Cho · Andreas Loukas · Vladimir Gligorijevic · Saeed Saremi

Halle B #14
[ ]
Tue 7 May 1:45 a.m. PDT — 3:45 a.m. PDT
 
Oral presentation: Oral 1A
Tue 7 May 1 a.m. PDT — 1:45 a.m. PDT

Abstract: We resolve difficulties in training and sampling from a discrete generative model by learning a smoothed energy function, sampling from the smoothed data manifold with Langevin Markov chain Monte Carlo (MCMC), and projecting back to the true data manifold with one-step denoising. Our $\textit{Discrete Walk-Jump Sampling}$ formalism combines the contrastive divergence training of an energy-based model and improved sample quality of a score-based model, while simplifying training and sampling by requiring only a single noise level. We evaluate the robustness of our approach on generative modeling of antibody proteins and introduce the $\textit{distributional conformity score}$ to benchmark protein generative models. By optimizing and sampling from our models for the proposed distributional conformity score, 97-100\% of generated samples are successfully expressed and purified and 70\% of functional designs show equal or improved binding affinity compared to known functional antibodies on the first attempt in a single round of laboratory experiments. We also report the first demonstration of long-run fast-mixing MCMC chains where diverse antibody protein classes are visited in a single MCMC chain.

Live content is unavailable. Log in and register to view live content