• Memristor-based Computation-in-Memory (CIM) has emerged as a compelling paradigm for designing energy-efficient  neural network hardware. However, memristors suffer from conductance variation issue, which introduces computational errors  in CIM hardware and leads to a degraded inference accuracy. In this paper, we present a new hardware-aware quantization to  mitigate the impact of conductance variation on CIM-based neural networks. To achieve this, we exploit the inherent truncation  of layer outputs that occurs during fixed-point arithmetic in CIM hardware. By tuning the bit-precision of weights, we align the  conductance variation-induced errors with output bits that get discarded during truncation. Thus, computational errors get  eliminated before reaching the next layer, resulting in a high inference accuracy. Results show that our approach effectively  mitigates the conductance variation impact on CIM hardware inference, without incurring any hardware overheads or design  changes. 

  • This paper explores the applications of Large Language Models (LLMs) in AI accelerator design, specifically focusing on  high-quality code generation and Electronic Design Automation (EDA) script generation. We investigate the use of LLMs for  generating Verilog code, an essential task in Register Transfer Level (RTL) design. LLMs are trained to understand and generate Verilog code, which is fundamental for describing the behavior and structure of AI accelerators. Additionally, we delve into the  generation of EDA scripts crucial for designing these circuits, ensuring they meet essential Power, Performance, and Area (PPA) constraints. Our approach leverages Domain-Adaptive Tokenization and Model Fine-Tuning, and incorporates Retrieval  Augmented Generation (RAG) techniques to dynamically enrich the model's context, better handling the intricacies of hardware  design language and script generation. The results demonstrate that LLMs, when fine-tuned with domain-specific knowledge, not  only achieve high accuracy in code generation but also adapt effectively to generate EDA scripts. This application of LLMs could  significantly automate and optimize the workflow in AI accelerator design, leading to faster design cycles and more efficient  designs.

  • Traditional deep learning models for natural language processing (NLP) deployed on edge devices face constraints due  to their high computational requirements. While the scaling of large language models (LLMs) has demonstrably improved  performance, it has also significantly increased computational requirements. Small language models (SLMs) offer a potential  solution, leveraging a reduced parameter space and incorporating syntactic information for improved efficiency. This talk will introduce a novel approach that leverages spiking neural networks (SNNs) for building compact and efficient small language  models (SLMs). To this end, SNNs are bio-inspired neural networks with spatio-temporal encoding that offer better scalability for  large-scale systems. We will discuss SNN architectures specifically designed for SLMs and deployed for NLP tasks. We will show a  hardware-software co-design methodology to synthesize such architectures, starting from high-level specifications in Python. 

  • Photonic Neural Networks (PNNs) have emerged as an exciting prospect for fueling the next generation of AI  accelerators, boasting attributes like high-speed operation, low-latency processing, and energy efficiency. By harnessing the  power of photonics, PNNs capitalize on multiple computational dimensions—such as time, wavelength, and space—to amplify  computational prowess, minimize latency, and enhance energy utilization. This multifaceted computational framework holds  significant promise for enhancing the efficiency of complex AI applications, notably Large Language Models (LLMs). This  presentation will delve into the potential of PNNs in expediting the performance of complex AI models, with a specific emphasis  on LLMs. We will explore a co-design methodology spanning across device engineering, circuit optimization, and architectural  innovations within PNNs to foster scalable and energy-conscious designs. Furthermore, we will discuss different challenges  associated with the further expansion of PNNs, while introducing potential solutions stemming from advancements in design  automation and exploration techniques tailored to PNNs.