Corpus methods in legal translation studies

Pontrandolfo, Gianluca

doi:10.4324/9781351031226-2

Cited by 14 publications

(3 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…There exist publicly available annotated NER data sets for general English text, such as CoNLL-2003 (Sang andDe Meulder, 2003), WNUT17 (Derczynski et al, 2017), and the Wikipedia gold standard corpus (Balasuriya et al, 2009), as well as for other languages (Neudecker, 2016;Sang and De Meulder, 2003;Santos et al, 2006;Ševčíková et al, 2007). For legal domainspecific data sets, non annotated legal text is abundant, as detailed in Pontrandolfo (2012). For example, the pre-training of Legal-BERT (Chalkidis et al, 2020) is performed on a corpus of non annotated documents consisting of legislation, court cases, and contracts from the UK, US, and the European Union.…”

Section: Related Workmentioning

confidence: 99%

E-NER — An Annotated Named Entity Recognition Corpus of Legal Text

Terence¹,

Lampos²,

Cox³

2022

Proceedings of the Natural Legal Language Processing Workshop 2022

View full text Add to dashboard Cite

Identifying named entities such as a person, location or organization, in documents can highlight key information to readers. Training Named Entity Recognition (NER) models requires an annotated data set, which can be a time-consuming labour-intensive task. Nevertheless, there are publicly available NER data sets for general English. Recently there has been interest in developing NER for legal text. However, prior work and experimental results reported here indicate that there is a significant degradation in performance when NER methods trained on a general English data set are applied to legal text. We describe a publicly available legal NER data set, called E-NER, based on legal company filings available from the US Securities and Exchange Commission's EDGAR data set. Training a number of different NER algorithms on the general English CoNLL-2003 corpus but testing on our test collection confirmed significant degradations in accuracy, as measured by the F1-score, of between 29.4% and 60.4%, compared to training and testing on the E-NER collection.

show abstract

Section: Related Workmentioning

confidence: 99%

E-NER — An Annotated Named Entity Recognition Corpus of Legal Text

Terence¹,

Lampos²,

Cox³

2022

Proceedings of the Natural Legal Language Processing Workshop 2022

View full text Add to dashboard Cite

show abstract

“…However, the availability of legal resources may vary dramatically given the inherently confidential and private nature of some legal documents and the institutional confines within which they are created. There are several useful overviews of contemporary legal corpora [72,85,86,107]. Worth recommending is also SOULL (Sources of Language and Law), an open on-line platform, regularly updated to provide a wealth of information about existing data collections and copora of legal language [94].…”

Section: Corpora In Legal Discoursementioning

confidence: 99%

“…Other types of corpora have been usually described in terms of dichotomies: general vs. specialized, monolingual vs. bi-or-multi-lingual, comparable vs. parallel, diachronic vs. synchronic, etc. (see [85] for an overview of legal corpora in multilingual settings). The choice of a specific type of corpus depends on research goals but discourse analysis is inevitably comparative.…”

Section: Corpora In Legal Discoursementioning

confidence: 99%

Corpus Linguistics in Legal Discourse

Goźdź-Roszkowski

2021

Int J Semiot Law

View full text Add to dashboard Cite

There are many different ways in which modern Corpus Linguistics can be used to enrich and broaden our understanding of legal discourse. Based on the central principle of co-occurrence and co-selection in language construction, this paper reviews current applications of Corpus Linguistics in the area of legal discourse focusing on issues ranging from phraseology, variation in legal discourse, legal translation, register and genre perspectives on legal discourse, legal discourse in forensic contexts to evaluative language in judicial settings. It revisits the notion of ‘corpus’ and it highlights the relevance of various types of legal corpora and computer tools in legal linguistic research.

show abstract