Advancing AI: Cross-disciplinary Insights into Next-Gen Tools, Tech & Architectures

We are excited to propose a special session on "AI Accelerators" for the upcoming ICCAD conference. This session, developed in collaboration with experts from both industries, including leaders like Nvidia and academia, is designed to comprehensively explore the full-stack technologies and methodologies at the forefront of AI acceleration. Organized into four insightful talks, it will cover three pivotal aspects of AI accelerator technology: First, we will examine the role of large language models (LLMs) in enhancing electronic design automation (EDA) tools specifically developed for AI accelerator chips. This discussion aims to demonstrate how LLMs can revolutionize design methodologies and optimization processes. Second, the session will highlight critical enabling technologies including Photonics and Emerging Memories. These technologies are crucial for the development of advanced AI accelerators, providing the necessary infrastructure to support next-generation computational needs. Third, we will delve into cutting-edge architectures with a focus on Computer-in-Memory (CIM) and Spiking Neural Networks (SNN). These architectures represent innovative approaches to hardware design, promising significant improvements in performance and efficiency. It promises to be highly relevant and welcomed by ICCAD participants, appealing to a broad spectrum of the conference's audience, from researchers and engineers to industry leaders and innovators. Our collaborative approach, integrating insights from both the industrial and academic sectors, ensures a rich dialogue and a comprehensive exploration of the state-of-the-art in AI accelerator technology.

- Delft University of Technology, Netherlands
Hardware-Aware Quantization for Accurate Memristor-Based Neural Networks
Memristor-based Computation-in-Memory (CIM) has emerged as a compelling paradigm for designing energy-efficient neural network hardware. However, memristors suffer from conductance variation issue, which introduces computational errors in CIM hardware and leads to a degraded inference accuracy. In this paper, we present a new hardware-aware quantization to mitigate the impact of conductance variation on CIM-based neural networks. To achieve this, we exploit the inherent truncation of layer outputs that occurs during fixed-point arithmetic in CIM hardware. By tuning the bit-precision of weights, we align the conductance variation-induced errors with output bits that get discarded during truncation. Thus, computational errors get eliminated before reaching the next layer, resulting in a high inference accuracy. Results show that our approach effectively mitigates the conductance variation impact on CIM hardware inference, without incurring any hardware overheads or design changes.
- Arizona State University, USA
LLM-AID: Leveraging Large Language Models for Automated AI Accelerator Design
This paper explores the applications of Large Language Models (LLMs) in AI accelerator design, specifically focusing on high-quality code generation and Electronic Design Automation (EDA) script generation. We investigate the use of LLMs for generating Verilog code, an essential task in Register Transfer Level (RTL) design. LLMs are trained to understand and generate Verilog code, which is fundamental for describing the behavior and structure of AI accelerators. Additionally, we delve into the generation of EDA scripts crucial for designing these circuits, ensuring they meet essential Power, Performance, and Area (PPA) constraints. Our approach leverages Domain-Adaptive Tokenization and Model Fine-Tuning, and incorporates Retrieval Augmented Generation (RAG) techniques to dynamically enrich the model's context, better handling the intricacies of hardware design language and script generation. The results demonstrate that LLMs, when fine-tuned with domain-specific knowledge, not only achieve high accuracy in code generation but also adapt effectively to generate EDA scripts. This application of LLMs could significantly automate and optimize the workflow in AI accelerator design, leading to faster design cycles and more efficient designs.
- Drexel University, USA
A Hardware-Embedded Spiking Neural Network for Efficient Small Language Models
Traditional deep learning models for natural language processing (NLP) deployed on edge devices face constraints due to their high computational requirements. While the scaling of large language models (LLMs) has demonstrably improved performance, it has also significantly increased computational requirements. Small language models (SLMs) offer a potential solution, leveraging a reduced parameter space and incorporating syntactic information for improved efficiency. This talk will introduce a novel approach that leverages spiking neural networks (SNNs) for building compact and efficient small language models (SLMs). To this end, SNNs are bio-inspired neural networks with spatio-temporal encoding that offer better scalability for large-scale systems. We will discuss SNN architectures specifically designed for SLMs and deployed for NLP tasks. We will show a hardware-software co-design methodology to synthesize such architectures, starting from high-level specifications in Python.
- Colorado State University, USA
Shedding Light on LLMs: Harnessing Photonic Neural Networks for Accelerating LLMs
Photonic Neural Networks (PNNs) have emerged as an exciting prospect for fueling the next generation of AI accelerators, boasting attributes like high-speed operation, low-latency processing, and energy efficiency. By harnessing the power of photonics, PNNs capitalize on multiple computational dimensions—such as time, wavelength, and space—to amplify computational prowess, minimize latency, and enhance energy utilization. This multifaceted computational framework holds significant promise for enhancing the efficiency of complex AI applications, notably Large Language Models (LLMs). This presentation will delve into the potential of PNNs in expediting the performance of complex AI models, with a specific emphasis on LLMs. We will explore a co-design methodology spanning across device engineering, circuit optimization, and architectural innovations within PNNs to foster scalable and energy-conscious designs. Furthermore, we will discuss different challenges associated with the further expansion of PNNs, while introducing potential solutions stemming from advancements in design automation and exploration techniques tailored to PNNs.