Poster
in
Workshop: AI4MAT-ICLR-2025: AI for Accelerated Materials Design
MatWheel: Addressing Data Scarcity in Materials Science Through Synthetic Data
Wentao Li · 陈奕哲 · Jiangjie Qiu · Xiaonan Wang
Keywords: [ Data Flywheel ] [ Synthetic Data ] [ Material Property Prediction ] [ Graph Neural Network ] [ Conditional Generative Model ]
Data scarcity and the high cost of annotation have long been persistent challenges in the field of materials science. Inspired by its potential in other fields like computer vision, we propose the MatWheel framework, which iteratively train the material property prediction model using the synthetic data generated by the conditional generative model. We explore two scenarios: fully-supervised and semi-supervised learning. Using CGCNN for property prediction and Con-CDVAE as the conditional generative model, experiments on six data-scarce material property datasets from Matminer database are conducted. Results show that synthetic data has potential in extreme data-scarce scenarios, achieving performance close to or exceeding that of real samples in all six tasks. We also find that pseudo-labels have little impact on generated data quality. Future work will integrate advanced models and optimize generation conditions to boost the effectiveness of the materials data flywheel.