RFR: Long-range navigation without perfect information using local reinforcement learning rules

Background

In some Reinforcement Learning (RL) tasks the agent has perfect information – for example, in Chess, the whole board and all pieces can be seen. In more complex scenarios such as 3D computer gameplay, the agent can only see its immediate surroundings and there may be observational uncertainty. Other agents’ states may also be undetectable, or implicitly inferred from their actions. In these cases, the problem is partially observable (PO).

One approach to PO/RL tasks is for the agent to maintain an internal model of the world based on past observations. The Deep Recurrent Q-Network (DRQN) is one approach to this [1], but a simpler alternative may now be possible and yield higher performance. 

Recurrent Sparse Memory (RSM) [2] uses only local and immediate credit assignment, meaning that the need for backpropagation through time and layers is not needed. This makes training of an RL agent much quicker and more memory efficient. Recent results show that RSM variants can outperform LSTM or other recurrent artificial neural networks trained with Back-Propagation Through Time (BPTT) in stochastic, partially observable conditions [3].

Aim and Outline

The primary aim of this project is to evaluate the RSM algorithm in an RL navigation context as a proof-of-concept that local learning rules can be successfully applied to global navigation tasks requiring memory of past states. This could be 2D mazes initially, extending to 3D environments such as Doom [4,5]. There is plenty of scope for added difficulty, such as time-varying environments, larger maps with increasing long-term memory demands, and interactions with other agents.

To move quickly onto the core research questions, we can provide source code for basic RL algorithms, the RSM memory algorithm, and the training environment. The project will involve integration of these existing components and development of more advanced RL algorithms to solve the navigation tasks. Successful execution of this project would likely result in publishable work.

References

[1] DQRN for Doom 3D, see: https://www.cs.cmu.edu/~dchaplot/papers/aaai17_fps_games.pdf
[2] RSM https://arxiv.org/abs/1905.11589
[3] bRSM https://arxiv.org/abs/1912.01116
[4] OpenAI Gym https://gym.openai.com/
[5] OpenAI Gym / Doom 3D https://gym.openai.com/envs/DoomCorridor-v0/