2022
DOI: 10.1007/s10579-021-09574-0
|View full text |Cite
|
Sign up to set email alerts
|

The ParlaMint corpora of parliamentary proceedings

Abstract: This paper presents the ParlaMint corpora containing transcriptions of the sessions of the 17 European national parliaments with half a billion words. The corpora are uniformly encoded, contain rich meta-data about 11 thousand speakers, and are linguistically annotated following the Universal Dependencies formalism and with named entities. Samples of the corpora and conversion scripts are available from the project’s GitHub repository, and the complete corpora are openly available via the CLARIN.SI repository … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
9
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
2
1

Relationship

3
5

Authors

Journals

citations
Cited by 30 publications
(17 citation statements)
references
References 28 publications
0
9
0
Order By: Relevance
“…This approach contrasts with previous work for similar user groups (e.g. [11,18], http://zoek.openraadsinformatie.nl), which typically focus on the dataor technology-driven innovations.…”
Section: Domain Specific Task Modelsmentioning
confidence: 91%
“…This approach contrasts with previous work for similar user groups (e.g. [11,18], http://zoek.openraadsinformatie.nl), which typically focus on the dataor technology-driven innovations.…”
Section: Domain Specific Task Modelsmentioning
confidence: 91%
“…The Turkish parliamentary corpus released as part of the ParlaMint project (Erjavec et al, 2021;Erjavec et al, 2022) contains the transcripts of the Turkish parliament (2011)(2012)(2013)(2014)(2015)(2016)(2017)(2018)(2019)(2020)(2021), including approximately 43M words from 303 505 speeches delivered at the main proceedings of the parliament. The data also contains speaker information (name, gender, party affiliation) and automatic annotations including morphology, dependency parsing and named entities.…”
Section: Large-scale (Unannotated) Linguistic Data Collectionsmentioning
confidence: 99%
“…The speeches also contain marked-up transcriber comments, such as gaps in the transcription, interruptions, applause, etc. More information about the creation of the corpora, the common standard, and specifics of each national corpus can be found in (Erjavec et al 2022).…”
Section: Parlamint Project Backgroundmentioning
confidence: 99%
“…Processing these heterogeneous records is challenging. However, the recent ParlaMint project has produced unified corpora of parliamentary debates in 17 European parliaments, making them widely accessible (Erjavec et al 2022). This broadens the possible scope of analysis from individual countries to joint issues and differences.…”
Section: Introductionmentioning
confidence: 99%