CircuitNet 3.0: A Multi-Modal Dataset with Task-Oriented Augmentation for AI-Driven Circuit Design
Abstract
Integrated circuit (IC) designs require transforming high-level specifications into physical layouts, demanding extensive expertise and specialized tools, as well as months of time and numerous iterations. While Machine Learning (ML) has shown promise in various research domains, the lack of large-scale, open datasets limits its application in chip design. To address this limitation, we introduce CircuitNet 3.0, a large-scale, comprehensive, and open-source dataset curated to facilitate the evaluation of ML models on challenging timing and power prediction tasks. Starting with a diverse set of 8,659 validated open-source designs, we employ a systematic framework to generate over 15,000 instances. Through specialized syntax-tree mutation strategies and principled, task-oriented filtering methodology, we enrich each design with multi-modal information spanning multiple design stages, including complete design flow documentation, register-transfer-level (RTL) designs and corresponding netlists, detailed physical layouts, and comprehensive performance metrics. The experimental results convincingly demonstrate that ML models leveraging multi-stage, multi-modal circuit representations significantly improve performance over existing open-source datasets in electronic design automation (EDA) tasks, paving the way for efficient and accessible circuit representation learning. The dataset and codes are available in https://anonymous.4open.science/r/ICLR26-CircuitNet3-272B.