2013
DOI: 10.1007/978-3-642-40087-2_6
|View full text |Cite
|
Sign up to set email alerts
|

PAN@FIRE: Overview of the Cross-Language !ndian Text Re-Use Detection Competition

Abstract: Abstract. The development of models for automatic detection of text re-use and plagiarism across languages has received increasing attention in the last years. However, the lack of an evaluation framework composed of annotated datasets has caused these efforts to be isolated. In this paper we present the CL!TR 2011 corpus, the first manually created corpus for the analysis of cross-language text re-use between English and Hindi. The corpus was used during the Cross-Language !ndian Text Re-Use Detection Competi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
4
1

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(8 citation statements)
references
References 9 publications
0
8
0
Order By: Relevance
“…Overall, the cross language text reuse detection results obtained on our corpus (best f 1 =0.552) are comparatively lower than for the METER corpus (best f 1 =0.664), which is a gold standard mono-lingual text reuse detection corpus (Clough, Gaizauskas, Piao, & Wilks, 2002). Moreover, the results are also low when compared with CL!TR corpus (best f 1 =0.600), which contains cross language text reuse cases for the English-Hindi language pair (Barrón-Cedeño, Rosso, Devi, Clough, & Stevenson, 2013). 13 The rationale being that our corpus contains real examples of paraphrased text whereas CL!TR is a simulated corpus, manually created by volunteers in a controlled environment, who were allowed to use online automatic tools to translate the text (from English to Hindi language) and then modify it.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…Overall, the cross language text reuse detection results obtained on our corpus (best f 1 =0.552) are comparatively lower than for the METER corpus (best f 1 =0.664), which is a gold standard mono-lingual text reuse detection corpus (Clough, Gaizauskas, Piao, & Wilks, 2002). Moreover, the results are also low when compared with CL!TR corpus (best f 1 =0.600), which contains cross language text reuse cases for the English-Hindi language pair (Barrón-Cedeño, Rosso, Devi, Clough, & Stevenson, 2013). 13 The rationale being that our corpus contains real examples of paraphrased text whereas CL!TR is a simulated corpus, manually created by volunteers in a controlled environment, who were allowed to use online automatic tools to translate the text (from English to Hindi language) and then modify it.…”
Section: Resultsmentioning
confidence: 99%
“…The CL!TR 4 (Cross-Language Indian Text Reuse) corpus is the first of its kind developed specifically for the analysis of cross-language text reuse detection in the Hindi-English language pair at document level (Barrón-Cedeño, Rosso, Devi, Clough, & Stevenson, 2013). The suspicious documents it contains are in Hindi and the source documents in English language.…”
Section: Related Workmentioning
confidence: 99%
“…In the name of cross-fertilization across evaluation forums, 9 in 2011, we started to be involved in the organization of tracks at FIRE, most of them as PAN tracks at FIRE. Initially, we addressed the problem of text reuse (2011) and SN Computer Science similarity search (2012, 2013), both from a cross-language perspective [10,23,24]. In the former two tracks, datasets with texts in English, Gujarati, and Hindi were provided.…”
Section: Pan Lab Tracks At Firementioning
confidence: 99%
“…In 2019, in the framework of a track on author profiling and deception detection in Arabic, we organized a task on the identification of age, gender, and language variety from tweets. 10 In this chapter, we will present three author profiling shared tasks we have organized at FIRE, describing the resources that we created and made available to the research community, illustrating the obtained results and highlighting the main achievements.…”
Section: Pan Lab Tracks At Firementioning
confidence: 99%
“…Martinez also investigated the cases where Wikipedia is mainly used for copy and paste plagiarism cases [26]. Wikipedia articles are taken as source documents for generating cross-lingual plagiarism detection corpus for Hindi-English language pair [27].…”
Section: Collection Of Source Textsmentioning
confidence: 99%