Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Integrating Generative and Experimental Platforms for Biomolecular Design

Residue-level text conditioning for protein language model mutation effect prediction

Dan Berenberg · Nate Gruver · Alan Amin · Peter Mørch Groth · Tianlai Chen · Harsh Srivastava · Pascal Notin · Debora Marks · Andrew Gordon Wilson · Kyunghyun Cho · Richard Bonneau


Abstract:

To augment protein sequence models with language, we introduce Conditioning on Residue-level Annotations from TExt (CRATE), a fine-tuning method that fuses two models using feature-wise linear modulation. We fine-tune protein language models at a large scale, first constructing a dataset (CRATE-train) joining annotations from InterPro and UniProtKB with sequences from UniRef90 resulting in approximately 105 million sequences each with at least three annotations and nearly 100\% sequence coverage on average. Applying CRATE to mutation effect prediction improves performance on the ProteinGym over prior benchmarks. Leveraging these improvements, we show CRATE can be used to select annotations with the largest positive impact on mutation effect prediction and estimate the deep mutational scan (DMS) scores tested over multiple different assay selection types.

Chat is not available.