On-Chip Training and Inference using Analog CMO/HfOx ReRAM Artificial Synapses
Donato Francesco Falcone a, Victoria Clerico a, Wooseok Choi a, Folkert Horst a, Matteo Galetta a, Antonio La Porta a, Bert Jan Offrein a, Valeria Bragaglia a
a IBM Research Europe - Zurich, Säumerstrasse, 4, Rüschlikon, Switzerland
Proceedings of Neuronics Conference 2025 (Neuronics25)
Tsukuba, Japan, 2025 June 17th - 20th
Organizers: Takashi Tsuchiya, Chu-Chen Chueh, Sabina Spiga and Jung-Yao Chen
Invited Speaker, Donato Francesco Falcone, presentation 022
Publication date: 15th April 2025

The increasing complexity of modern neural network architectures has led to a substantial rise in energy consumption during both training and inference, especially when relying on conventional CMOS hardware based on the von Neumann architecture. The continuous exchange of data between memory and processing units represents a major bottleneck, limiting both efficiency and speed in artificial neural network (ANN) computations. To address these challenges, specialized neuromorphic architectures leveraging analog memristor crossbar arrays have emerged as a promising alternative, offering improved energy efficiency and computational speed for AI workloads, by enabling some arithmetic and logic operations to be performed directly at the location where the data is stored [1].

Recent implementations of the analog in-memory computing (AIMC) paradigm have primarily focused on accelerating the inference step (i.e., the forward pass) of digitally trained deep neural networks (DNNs), even though the training phase is orders of magnitude more demanding in terms of time and energy costs. This is because analog training acceleration imposes even more stringent requirements on memristive devices. In addition to performing inference, the learning phase requires handling error backpropagation, gradient computation, and weight updates. Promising memristive technologies that could address these challenges include redox-based resistive switching memory (ReRAM) [2] and electrochemical random access memory (ECRAM) [3]. However, a unified analog in-memory technology platform—capable of on-chip training, weight retention, and long-term inference acceleration—has yet to be demonstrated.

This work fills this gap by demonstrating an all-in-one AI accelerator based on Conductive Metal-Oxide (CMO)/HfOx ReRAM technology, enabling the execution of forward and backward passes, along with weight updates and gradient computations, directly on a unified analog in-memory platform with O(1) time complexity [4]. The CMO/HfOx ReRAM devices are integrated in the BEOL of a NMOS transistor platform in a scalable 1T1R array architecture. The highly reproducible forming step demonstrates compatibility with NMOS rated for 3.3 V operation, while the uniform quasi-static cycling characteristics, achieved with voltage amplitudes of less than ± 1.5 V, exhibit a significant conductance window and a low off-state. The multi-bit capability of more than 32 states (5 bits) as well as the record-low programming noise ranging from 10 nS to 100 nS will be presented. Inference performance is validated through matrix-vector multiplication simulations on a 64×64 array, achieving a root-mean-square error improvement by a factor of 20 at 1 second and 3 at 10 years after programming, compared to state-of-the-art. Training accuracy closely matching the software equivalent is achieved across different datasets using the same technology.

The CMO/HfOx ReRAM technology lays the foundation for efficient analog systems accelerating both inference and training in deep neural networks.

© FUNDACIO DE LA COMUNITAT VALENCIANA SCITO
We use our own and third party cookies for analysing and measuring usage of our website to improve our services. If you continue browsing, we consider accepting its use. You can check our Cookies Policy in which you will also find how to configure your web browser for the use of cookies. More info