“…Transformer [32] is advancing steadily in the areas of natural language processing [4,8,18,27,26,19], computer vision [9,2,31,21], and audio processing [12,1,13,30]. Although it outperforms other architectures such as RNNs [7] and CNNs [16,14,11] in many sequence modeling tasks, its lack of length extrapolation capability limits its ability to handle a wide range of sequence lengths, i.e., inference sequences need to be equal to or shorter than training sequences.…”