Advancing AI: Cross-disciplinary Insights into Next-Gen Tools, Tech & Architectures

We are excited to propose a special session on "AI Accelerators" for the upcoming ICCAD conference. This session,  developed in collaboration with experts from both industries, including leaders like Nvidia and academia, is designed to  comprehensively explore the full-stack technologies and methodologies at the forefront of AI acceleration. Organized into four  insightful talks, it will cover three pivotal aspects of AI accelerator technology: First, we will examine the role of large language  models (LLMs) in enhancing electronic design automation (EDA) tools specifically developed for AI accelerator chips. This  discussion aims to demonstrate how LLMs can revolutionize design methodologies and optimization processes. Second, the  session will highlight critical enabling technologies including Photonics and Emerging Memories. These technologies are crucial  for the development of advanced AI accelerators, providing the necessary infrastructure to support next-generation  computational needs. Third, we will delve into cutting-edge architectures with a focus on Computer-in-Memory (CIM) and Spiking  Neural Networks (SNN). These architectures represent innovative approaches to hardware design, promising significant  improvements in performance and efficiency. It promises to be highly relevant and welcomed by ICCAD participants, appealing  to a broad spectrum of the conference's audience, from researchers and engineers to industry leaders and innovators. Our  collaborative approach, integrating insights from both the industrial and academic sectors, ensures a rich dialogue and a  comprehensive exploration of the state-of-the-art in AI accelerator technology. 

 

  • Hardware-Aware Quantization for Accurate Memristor-Based Neural Networks

    Memristor-based Computation-in-Memory (CIM) has emerged as a compelling paradigm for designing energy-efficient  neural network hardware. However, memristors suffer from conductance variation issue, which introduces computational errors  in CIM hardware and leads to a degraded inference accuracy. In this paper, we present a new hardware-aware quantization to  mitigate the impact of conductance variation on CIM-based neural networks. To achieve this, we exploit the inherent truncation  of layer outputs that occurs during fixed-point arithmetic in CIM hardware. By tuning the bit-precision of weights, we align the  conductance variation-induced errors with output bits that get discarded during truncation. Thus, computational errors get  eliminated before reaching the next layer, resulting in a high inference accuracy. Results show that our approach effectively  mitigates the conductance variation impact on CIM hardware inference, without incurring any hardware overheads or design  changes. 

  • LLM-AID: Leveraging Large Language Models for Automated AI Accelerator Design

    This paper explores the applications of Large Language Models (LLMs) in AI accelerator design, specifically focusing on  high-quality code generation and Electronic Design Automation (EDA) script generation. We investigate the use of LLMs for  generating Verilog code, an essential task in Register Transfer Level (RTL) design. LLMs are trained to understand and generate Verilog code, which is fundamental for describing the behavior and structure of AI accelerators. Additionally, we delve into the  generation of EDA scripts crucial for designing these circuits, ensuring they meet essential Power, Performance, and Area (PPA) constraints. Our approach leverages Domain-Adaptive Tokenization and Model Fine-Tuning, and incorporates Retrieval  Augmented Generation (RAG) techniques to dynamically enrich the model's context, better handling the intricacies of hardware  design language and script generation. The results demonstrate that LLMs, when fine-tuned with domain-specific knowledge, not  only achieve high accuracy in code generation but also adapt effectively to generate EDA scripts. This application of LLMs could  significantly automate and optimize the workflow in AI accelerator design, leading to faster design cycles and more efficient  designs.

  • A Hardware-Embedded Spiking Neural Network for Efficient Small Language Models

    Traditional deep learning models for natural language processing (NLP) deployed on edge devices face constraints due  to their high computational requirements. While the scaling of large language models (LLMs) has demonstrably improved  performance, it has also significantly increased computational requirements. Small language models (SLMs) offer a potential  solution, leveraging a reduced parameter space and incorporating syntactic information for improved efficiency. This talk will introduce a novel approach that leverages spiking neural networks (SNNs) for building compact and efficient small language  models (SLMs). To this end, SNNs are bio-inspired neural networks with spatio-temporal encoding that offer better scalability for  large-scale systems. We will discuss SNN architectures specifically designed for SLMs and deployed for NLP tasks. We will show a  hardware-software co-design methodology to synthesize such architectures, starting from high-level specifications in Python. 

  • Shedding Light on LLMs: Harnessing Photonic Neural Networks for Accelerating LLMs

    Photonic Neural Networks (PNNs) have emerged as an exciting prospect for fueling the next generation of AI  accelerators, boasting attributes like high-speed operation, low-latency processing, and energy efficiency. By harnessing the  power of photonics, PNNs capitalize on multiple computational dimensions—such as time, wavelength, and space—to amplify  computational prowess, minimize latency, and enhance energy utilization. This multifaceted computational framework holds  significant promise for enhancing the efficiency of complex AI applications, notably Large Language Models (LLMs). This  presentation will delve into the potential of PNNs in expediting the performance of complex AI models, with a specific emphasis  on LLMs. We will explore a co-design methodology spanning across device engineering, circuit optimization, and architectural  innovations within PNNs to foster scalable and energy-conscious designs. Furthermore, we will discuss different challenges  associated with the further expansion of PNNs, while introducing potential solutions stemming from advancements in design  automation and exploration techniques tailored to PNNs.