Orthogonal Gated Recurrent Unit With Neumann-Cayley Transformation

Vasily Zadorozhnyy Edison Mucllari Cole Pospisil Duc Nguyen Qiang Ye

December 2024

Abstract

In recent years, using orthogonal matrices has been shown to be a promising approach to improving recurrent neural networks (RNNs) with training, stability, and convergence, particularly to control gradients. While gated recurrent unit (GRU) and long short-term memory (LSTM) architectures address the vanishing gradient problem by using a variety of gates and memory cells, they are still prone to the exploding gradient problem. In this work, the authors analyze the gradients in GRU and propose the use of orthogonal matrices to prevent exploding gradient problems and enhance long-term memory. They study where to use orthogonal matrices and propose a Neumann series–based scaled Cayley transformation for training orthogonal matrices in GRU (Neumann-Cayley orthogonal GRU, NC-GRU). Detailed experiments on synthetic and real-world tasks show that NC-GRU significantly outperforms GRU and several other RNNs.

Type

Journal article

Publication

Neural Computation, 36(12)

Add full text or supplementary material for the paper here.

Duc Nguyen

Associate Professor of Mathematics

Duc Nguyen develops mathematical and AI frameworks for molecular bioscience, drug discovery, and scientific computing. His group blends differential geometry, graph theory, and machine learning to build high-fidelity models for biomolecular systems, with notable wins in the D3R Grand Challenges and collaborations with Pfizer and Bristol Myers Squibb. Supported by multiple NSF awards, he has advised students and postdocs across theory and applications of AI-driven drug design.