Poster
in
Workshop: Machine Learning for Genomics Explorations (MLGenX)
Gradient-Based Gene Selection for scRNA-seq Foundation Models
Farhan khodaee · Rohola Zandie · Pakaphol Thadawasin · Elazer Edelman
Foundation models have emerged as powerful tools for analyzing single-cell RNA sequencing (scRNA-seq) data, leveraging large-scale pretraining to capture complex gene expression patterns. However, selecting informative gene features for both input to the model and analysis in the output remains a critical challenge. Traditional feature selection methods filter on the basis of highly variable genes and analyze them using differential distribution, but they often struggle with scalability and robustness in heterogeneous high-dimensional datasets. In this study, we explore the limitations of conventional feature selection techniques in the context of foundation models and propose alternative gradient-based attribution techniques on learned feature embeddings to improve feature selection. Through empirical evaluations on scRNA-seq datasets benchmark, we illustrate how our selection strategy can optimize foundation model performance while overcoming the constraints of traditional approaches. Overall, this work discusses the importance of rethinking feature selection paradigms to unlock the full potential of foundation models for an interpretable discovery of disease biomarkers and therapeutic targets.