“…Many neural network models take the form of a first order (in weights) recurrent neural network (RNN) and have been taught to learn context free and context-sensitive counter languages [17,9,5,64,70,56,48,66,8,36,8,67]. However, from a theoretical perspective, RNNs augmented with an external memory have historically been shown to be more capable of recognizing context free languages (CFLs), such as with a discrete stack [10,55,61], or, more recently, with various differentiable memory structures [33,26,24,39,73,28,72,25,40,41,3,42]. Despite positive results, prior work on CFLs was unable to achieve perfect generalization on data beyond the training dataset, highlighting a troubling difficulty in preserving long term memory.…”