“…RNNs can be viewed as dynamical systems and many works have used this viewpoint to study RNNs, e.g., [22,23,24,25]. Other related work includes relation to kernel methods, e.g., [26,27,28], linear RNNs [29], saturated RNNs [30,31,32], and echo state networks [33,34]. Several other works talk about the expressive power of the novel sequence to sequence models Transformers [35,36].…”