Dataset and Benchmark for Digital Design
The application of Machine Learning (ML) in Electronic Design Automation (EDA) for Very Large-Scale Integration (VLSI) design has garnered significant research attention. Despite the requirement for extensive datasets to build effective ML models, most studies are limited to smaller, internally generated datasets due to the lack of comprehensive public resources. In response, we introduce EDALearn, a holistic, open-source dataset as well as benchmark suite specifically for ML tasks in EDA. It presents an end-to-end flow from synthesis to physical implementation, en riching data collection across various stages. It fosters reproducibil ity and promotes research into ML transferability across different technology nodes. Accommodating a wide range of VLSI design instances and sizes, our dataset and benchmark aptly represent the complexity of contemporary VLSI designs. Additionally, we pro vide an in-depth data analysis, enabling users to fully comprehend the attributes and distribution of our data, which is essential for creating efficient ML models. Our contributions aim to encourage further advances in the ML-EDA domain.