LeanForPhysics: Comprehensive Reasoning Framework for University-level Physics in Lean4
Abstract
We present Lean4PHYS, a comprehensive reasoning framework for college-level physics problems in Lean4. Lean4PHYS includes LeanPhysBench, a college-level benchmark for Lean4 formal physics reasoning, which contains 200 hand-crafted and peer-reviewed statements formalized from university textbooks and physics competition problems. To establish a solid foundation for formal reasoning in physics, we also launch PhysLib, a repository that contains fundamental unit systems and theorems essential for formal physics proving, which aims to be community-driven and long-term maintained. Based on the LeanPhysBench and PhysLib we composed in Lean4PHYS, we perform exhaustive experiments of baseline results using major expert Math Lean4 provers and state-of-the-art closed-source models, and provide an analysis of their performance. In the experiment, we identify that most expert provers do not outperform general models as they did in the math domain. This indicates a potential overfitting of the math domain rather than learning formal reasoning for formal provers. We also conduct a comprehensive experiment showing that with PhysLib in the context, LLMs' performance on LeanPhysBench increases by 11.90\% on average, proving the effectiveness of our repository in assisting LLMs to solve the Lean4 physics problem. To the best of our knowledge, we are the first study to provide a physics benchmark in Lean4.