Hardware-Aware Quantization for Accurate Memristor-Based Neural Networks
Memristor-based Computation-in-Memory (CIM) has emerged as a compelling paradigm for designing energy-efficient neural network hardware. However, memristors suffer from conductance variation issue, which introduces computational errors in CIM hardware and leads to a degraded inference accuracy. In this paper, we present a new hardware-aware quantization to mitigate the impact of conductance variation on CIM-based neural networks. To achieve this, we exploit the inherent truncation of layer outputs that occurs during fixed-point arithmetic in CIM hardware. By tuning the bit-precision of weights, we align the conductance variation-induced errors with output bits that get discarded during truncation. Thus, computational errors get eliminated before reaching the next layer, resulting in a high inference accuracy. Results show that our approach effectively mitigates the conductance variation impact on CIM hardware inference, without incurring any hardware overheads or design changes.