Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Integrating Generative and Experimental Platforms for Biomolecular Design

Sequence-based protein models for the prediction of mutations across priority viruses

Sarah Gurev · Noor Youssef · Navami Jain · Debora Marks


Abstract:

Viruses pose a significant threat to human health. Advances in machine learning for predicting mutation effects have enhanced viral surveillance and enabled the proactive design of vaccines and therapeutics, but the accuracy of these methods across priority viruses remain unclear. We curate 51 standardized viral deep mutational scanning assays to systematically evaluate the performance of three alignment-based models, three Protein Language Models (PLMs), and two structure-aware PLMs with different training databases. We find marked differences in performance of these models relative to non-viral proteins. For viral proteins, we find one best-performing alignment-based model, EVE, which performs on par with SaProt-PDB, the best-performing PLM, though with predictable differences in which model is better for a particular virus. We define confidence metrics for both alignment-based models and PLMs that indicate when additional sequence or structural data may be needed for accurate predictions and to guide model selection in the absence of available data for evaluation. We perform the first large-scale modeling across 40 WHO priority pathogens, many of which are under-surveilled, discovering that most have sufficient sequence or structural information for effective modeling, highlighting the potential for using these approaches in pandemic preparedness.

Chat is not available.