Poster
in
Workshop: Integrating Generative and Experimental Platforms for Biomolecular Design
A data guided approach to building an ML ready protein expression dataset
Catherine Baranowski · Han Spinner · Peter Kelly
Recombinant protein expression is central to academic exploration as well as biotechnology’s advancement of human health, climate applications and the bioeconomy in general. However, not all proteins can be expressed in all organisms, and the field lacks a predictive model of soluble protein expression that could replace laborious experimental trial-and-error. This project aims to design and test an openly available and extensible experimental platform and standardized data ontology for collecting soluble recombinant protein expression data across organisms. The resulting public dataset will be used in building predictive models of protein expression. Here we share preliminary assay feasibility data in our first expression host organism, Escherichia coli.