“…Some studies also exploited several neural-based models such as LSTM [29,92,94,135], Bi-LSTMs [29], and GRU [6,28]. The more recent works adopted several transformer-based architectures due to the availability of multilingual transformer models such as Multilingual BERT [1,6,48,92,100,132,135], RoBERTa [30,31], XLM [28,132], and XLM-RoBERTa [30,31,48,110]. Interestingly, we also notice some works that proposed a multichannel architecture based on the multilingual BERT model [20,130], which allows the model to learn the task in several languages sequentially.…”