Skip to yearly menu bar Skip to main content


Poster
in
Workshop: World Models: Understanding, Modelling and Scaling

Reframing LLM Finetuning Through the Lens of Bayesian Optimization

Bojana Ranković · Ryan-Rhys Griffiths · Philippe Schwaller

Keywords: [ Deep Metric Learning ] [ Bayesian optimization ] [ Gaussian processes ] [ Chemical optimization ] [ LLM finetuning ]


Abstract:

Large Language Models (LLMs) can encode complex relationships in their latent spaces, yet harnessing them for optimization under uncertainty remains challenging. We tackle this challenge with a novel architecture that reframes LLM finetuning as a Gaussian process (GP) marginal likelihood optimization through deep kernel methods. We introduce a concept of LLM-based deep kernels jointly optimized with GP to preserve the benefits of both - LLMs to provide rich and flexible input space for Bayesian optimization and - GPs to model the uncertainties across this input space for efficient sampling. Moreover, we uncover an implicit contrastive learning effect in the embedding space as a consequence of kernel-based finetuning procedure by jointly optimizing LLM and GP parameters through the marginal likelihood objective. Our approach dynamically reorganizes the latent space to better reflect functional relationships without requiring explicit contrastive losses.Applied to chemical reaction optimization tasks, our method nearly doubles the discovery rate of high-performing reactions compared to static LLM embeddings (from 24% to 43% coverage of the top 5% reactions within only 50 optimization iterations) and more importantly improves over the strong chemistry-related features. This substantial improvement emerges from the GP naturally separating high-performing regions in the embedding space while maintaining well-calibrated uncertainty estimates. Extensive empirical evaluation across multiple chemistry benchmarks, from reaction optimization to molecules, demonstrates the generality of our method across both data and different LLM architectures.Our analysis reveals that GP-based finetuning creates more favorable geometric properties in the embedding space than supervised approaches, resulting in better-calibrated uncertainties and more efficient exploration. This work provides both insights into what makes embeddings effective for Bayesian optimization and practical advances for sample-efficient optimization of complex chemical spaces.

Chat is not available.