Poster
in
Workshop: Integrating Generative and Experimental Platforms for Biomolecular Design
OPUS-GO: Unlocking Residue-level Insights from Sequence-level Annotations Using Biological Language Models
Gang Xu · Ying Lv · Ruoxi Zhang · Xinyuan Xia · Qinghua Wang · Jianpeng Ma
Accurate annotation of protein is crucial for understanding their structural and functional properties. Existing biological language model (BLM)-based methods, however, often prioritize sequence-level classification accuracy while neglecting residue-level interpretability, as sequence-level annotations are easier to obtain. To address this, we introduce OPUS-GO, a method that improves sequence-level predictions while also providing detailed residue-level insights by pinpointing critical residues associated with functional labels. By employing a modified Multiple Instance Learning (MIL) strategy with BLM representations, OPUS-GO outperforms baseline methods in both sequence-level and residue-level classification accuracy across various downstream tasks for protein sequences, including Gene Oncology (GO)-term prediction for proteins. Furthermore, the identified residues can serve as promising “prompts” for molecular design models, such as ESM-3, enabling the generation of sequences with the desired functionality.