Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Machine Learning for Genomics Explorations (MLGenX)

Exploring the potential of genetic variation and zygosity in DNA language models

Ali Saadat · Jacques Fellay


Abstract:

Advancements in DNA language models (DNA-LMs) have improved phenotype prediction from DNA sequences, yet the roles of zygosity and genetic variation (GV) remain underexplored. In this study we quantify their effects on gene expression prediction as an example of variation-sensitive phenotype, showing that baseline models benefit from zygosity- and GV-aware encoding, while DNA-LMs struggle to utilize them. These findings underscore the need for integrating biologically meaningful features like zygosity and GV in DNA-LM pretraining to better capture genetic diversity and improve variant interpretation.

Chat is not available.