Skip to yearly menu bar Skip to main content


Poster
in
Workshop: AI4MAT-ICLR-2025: AI for Accelerated Materials Design

Benchmarking Text Representations for Crystal Structure Generation with Large Language Models

Shuyi Jia · Aamod Varma · Pranav Manivannan · Dhruva Chayapathy · Victor Fung

Keywords: [ materials discovery ] [ generative models ] [ large language models ]


Abstract:

The discovery of novel materials is essential for scientific and technological advancements but remains a significant challenge due to the vastness of the chemical space. Large language models (LLMs) have shown particular promise as generative models for materials discovery, where novel materials are generated in the form of textual representations of their crystal structures. In this work, we benchmark the performance of several textual representations with different levels of invariances and invertibility for crystal structure generation, covering Cartesian, Z-matrix, distance matrix, and SLICES representations. We find that all representations can be effectively leveraged by LLMs for structure generation. However, we observe that the inclusion of translation and rotation invariances in more complex representations does not necessarily yield better generation performance, contrary to expectations. These findings suggest that established design principles for conventional structure representations do not apply for LLMs. This study establishes the first benchmark for textual representations in crystal structure generation using fine-tuned LLMs, offering a foundation for accelerating materials discovery with language models.

Chat is not available.