RFR: Using Reinforcement Learning to discover attentional strategies

Background

Recent work on language modelling has shown that attentional filtering produces state-of-the-art performance [1,2]. It seems that the act of selectively remembering context enables learning a better model. But current Transformer architectures employ deep-backpropagation through many layers and back through time. This is expensive both in terms of memory and computation, but it’s essential for learning to control the attention mechanism.

RSM is an alternative approach to credit assignment in neural networks that only uses local and immediate inputs, yet is able to exploit delayed and distant information [3]. Recent improvements to RSM have delivered excellent results in language modelling [4], better than recurrent neural networks but still inferior to Transformer and LSTM networks that selectively store and use historical context. 

One of the key features of RSM is “biological plausibility”. This means that it learns in a way that’s more likely to approximate learning in biological neurons. To maintain biological plausibility while adding attentional filtering requires a way to associate current attention behaviour with later consequences for predictability. A biologically-plausible way of doing this is to use reinforcement-learning to estimate the “reward” of particular attentional strategies in specific contexts.

Aim and Outline

This project aims to combine RSM with attentional filtering. The attentional filtering would aim to reproduce the effects achieved in Transformer, but without deep-backpropagation. Instead of gradients, discounted rewards are propagated backwards in time to control the attentional strategies and maximize e.g. next word prediction perplexity.

This architecture has immediate and obvious applications to natural language processing, but could also be extended to image understanding by learning to control a series of fixation points within an image to classify its content.

To move quickly onto the core research questions, we can provide source code for basic RL algorithms, the RSM memory algorithm, and the training environment. You would be able to integrate these existing components and develop attentional architectures resulting in superior performance. Successful execution of this project would likely result in publishable work.

URLs and References

[1] “Attention is all you need” https://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf
[2] GPT-2 (Transformer language model) https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf
[3] RSM https://arxiv.org/abs/1905.11589
[4] bRSM https://arxiv.org/abs/1912.01116