“…These models have been especially promising for long sequences, which are challenging for architectures such as Transformers [75], and has required custom approaches to adapt to higher-dimensional data [20,47] or long sequences [13,76]. Deep SSMs have shown state-of-the-art performance on a number of domains, including time series data [30,77,79], audio [27], visual data [53], text [17,50,51], and medical data [70]. A number of methods have also been proposed to simplify the S4 architecture in parameterization [31,34,68], make the parameterization more numerically stable [27], or improve the initialization [32].…”