“…There exist publicly available annotated NER data sets for general English text, such as CoNLL-2003 (Sang andDe Meulder, 2003), WNUT17 (Derczynski et al, 2017), and the Wikipedia gold standard corpus (Balasuriya et al, 2009), as well as for other languages (Neudecker, 2016;Sang and De Meulder, 2003;Santos et al, 2006;Ševčíková et al, 2007). For legal domainspecific data sets, non annotated legal text is abundant, as detailed in Pontrandolfo (2012). For example, the pre-training of Legal-BERT (Chalkidis et al, 2020) is performed on a corpus of non annotated documents consisting of legislation, court cases, and contracts from the UK, US, and the European Union.…”