Christos Chatzikonstantinou, Dimitrios Konstantinidis, Kosmas Dimitropoulos and Petros Daras.
Network pruning techniques are widely employed to reduce the memory requirements and increase the inference speed of neural networks. This work proposes a novel RNN pruning method that considers the RNN weight matrices as collections of time-evolving signals. Such signals that represent weight vectors can be modelled using Linear Dynamical Systems (LDSs). In this way, weight vectors with similar temporal dynamics can be pruned as they have limited effect on the performance of the model. Additionally, during the fine-tuning of the pruned model, a novel discrimination-aware variation of the L2 regularization is introduced to penalize network weights (i.e., reduce the magnitude), whose impact on the output of an RNN network is minimal.
Finally, an iterative fine-tuning approach is proposed that employs a bigger model to guide an increasingly smaller pruned one, as a steep decrease of the network parameters can irreversibly harm the performance of the pruned model. Extensive experimentation with different network architectures demonstrates the potential of the proposed method to create pruned models with significantly improved perplexity by at least 0.62% on the PTB dataset and improved F1-score by 1.39% on the SQuAD dataset, contrary to other state-of-the-art approaches that slightly improve or even deteriorate models’ performance. (https://www.sciencedirect.com/science/article/abs/pii/S0893608021002641)