“…Almost all teams used some Transformer-based models (especially BERT (Devlin et al, 2018)) either to get embeddings or as a pretrained model (Yoosuf and Yang, 2019) (Hou and Chen, 2019). Other teams often used ensembles with different features and models inside: LSTM-CRF (Gupta et al, 2019), XGBoost (Tayyar Madabushi et al, 2019), BiLSTM (Vlad et al, 2019) Figure 1: Class distribution in the train data, where A is "Loaded Language", B is "Name Calling or Labeling", C is "Repetition", D is "Doubt", E is "Exaggeration or Minimisation", and F represents all the remaining 9 classes.…”