2020
DOI: 10.1051/itmconf/20203303006
|View full text |Cite
|
Sign up to set email alerts
|

Creating a learner corpus infrastructure: Experiences from making learner corpora available

Abstract: With language resources being collected in many - also small - projects in learner corpus research with considerate amounts of time and ef- fort spent in this activity, making these types of data available in a FAIR way, with standardized and reasoned methods, would contribute substan- tially to the advancement of the field. Additionally, it would answer current demands in transparency, replicability and reusability. In this article, we dis- cuss some of the challenges when making learner corpora FAIR and repo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 10 publications
(11 reference statements)
0
1
0
Order By: Relevance
“…Datasets and corpora collected from (second) language learners contain private information represented both on the metadata level and -depending on the topic -in the texts. Presence of personal information makes those datasets non-trivial to share with the public in a FAIR 5 way (Frey et al, 2020;Volodina et al, 2020), to say nothing of a potential to use such data for shared tasks. This is rather unfortunate since collection and preparation of such corpora is an extremely time-consuming and expensive process.…”
Section: Reflections On Access To Learner Datamentioning
confidence: 99%
“…Datasets and corpora collected from (second) language learners contain private information represented both on the metadata level and -depending on the topic -in the texts. Presence of personal information makes those datasets non-trivial to share with the public in a FAIR 5 way (Frey et al, 2020;Volodina et al, 2020), to say nothing of a potential to use such data for shared tasks. This is rather unfortunate since collection and preparation of such corpora is an extremely time-consuming and expensive process.…”
Section: Reflections On Access To Learner Datamentioning
confidence: 99%