“…Corpora consisting of texts produced by non-native speakers present an invaluable source of linguistic data for researchers. Various studies have been conducted on the basis of learner corpora: automatic language scoring (Vajjala, 2018), identifying text complexity (Kurdi, 2020), automatic text classification within different, proper word choice task (Makarenkov et al, 2019), semantic collocation correction (Dahlmeier and Ng, 2011), lexical substitution (McCarthy and Navigli, 2009), paraphrase generation (Madnani and Dorr, 2010), grammatical error correction (Ng et al, 2014), sentence completion (Zweig and Burges, 2011), to name just a few. Also, there are many papers devoted to obtaining document embeddings ((Salton and Buckley, 1988), (Whissell and Clarke, 2011), (Mikolov and Le, 2014)), clustering algorithms ( (Steinhaus, 1956), (Ester at al., 1996), (Merris, 1994)), and various techniques for keywords extraction ( (Mihalcea and Tarau, 2004), (Rose et al, 2010), (Sterckx et al, 2016)).…”