OXtal: An All-Atom Diffusion Model for Organic Crystal Structure Prediction
Emily Jin · Andrei Nica · Kin Long Kelvin Lee · Joey Bose · Mikhail Galkin · Santiago Miret · Jarrid Rector-Brooks · Alexander Tong · Michael Bronstein · Frances Arnold · Chenghao Liu
Abstract
Accurately predicting experimentally-realizable $3\textrm{D}$ molecular crystal structures from their $2\textrm{D}$ chemical graphs is a long-standing open challenge in computational chemistry called $\textit{crystal structure prediction}$ (CSP). Efficiently solving this problem has implications ranging from pharmaceuticals to organic semiconductors, as crystal packing directly governs the physical and chemical properties of organic solids. In this paper, we introduce $\textrm{OXtal}$, a large-scale $100\textrm{M}$ parameter all-atom diffusion model that directly learns the conditional joint distribution over intramolecular conformations and periodic packing. To efficiently scale $\textrm{OXtal}$, we abandon explicit equivariant architectures imposing inductive bias arising from crystal symmetries in favor of data augmentation strategies. We further propose a novel crystallization-inspired lattice-free training scheme, $\textit{Stoichiometric Stochastic Shell Sampling}$ ($S^4$), that efficiently captures long-range interactions while sidestepping explicit lattice parametrization---thus enabling more scalable architectural choices at all-atom resolution. Trained on $600 \text{K}$ experimentally validated crystal structures (including rigid and flexible molecules, co-crystals, and solvates), $\textrm{OXtal}$ achieves orders-of-magnitude improvements over prior $\textit{ab-initio}$ ML CSP methods, which remaining orders of magnitude cheaper than traditional quantum-chemical approaches. Specifically, $\textrm{OXtal}$ reproduces experimental structures with conformer $\mathrm{RMSD}_1<0.5$ Å and attains over 80\% lattice-match success, demonstrating its ability to model both thermodynamic and kinetic regularities of molecular crystallization.
Successful Page Load