Proceedings of the 10th Workshop on Building and Using Comparable Corpora 2017
DOI: 10.18653/v1/w17-2512
|View full text |Cite
|
Sign up to set email alerts
|

Overview of the Second BUCC Shared Task: Spotting Parallel Sentences in Comparable Corpora

Abstract: This paper presents the BUCC 2017 shared task on parallel sentence extraction from comparable corpora. It recalls the design of the datasets, presents their final construction and statistics and the methods used to evaluate system results. 13 runs were submitted to the shared task by 4 teams, covering three of the four proposed language pairs: French-English (7 runs), German-English (3 runs), and Chinese-English (3 runs). The best F-scores as measured against the gold standard were 0.84 (GermanEnglish), 0.80 (… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
68
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
4
3
3

Relationship

0
10

Authors

Journals

citations
Cited by 78 publications
(83 citation statements)
references
References 10 publications
(18 reference statements)
1
68
0
Order By: Relevance
“…Since 2017, the workshop on Building and Using Comparable Corpora (BUCC) is organizing a shared task to evaluate the performance of approaches to mine for parallel sentences in comparable corpora (Zweigenbaum et al, 2018). Table 1 summarizes the available data, and Table 2 the official results.…”
Section: Experimental Evaluation: Bucc Shared Task On Mining Bitextsmentioning
confidence: 99%
“…Since 2017, the workshop on Building and Using Comparable Corpora (BUCC) is organizing a shared task to evaluate the performance of approaches to mine for parallel sentences in comparable corpora (Zweigenbaum et al, 2018). Table 1 summarizes the available data, and Table 2 the official results.…”
Section: Experimental Evaluation: Bucc Shared Task On Mining Bitextsmentioning
confidence: 99%
“…It should be noted that our model is specifically tuned for the task of misalignment detection, whereby parallel corpus mining and corpus cleaning (we refer to the WMT shared task on parallel corpus filtering [27,28] and the BUCC shared task on parallel corpus mining [29]), involve other important factors such as selection of sentences based on fluency and diversity anddetection of language errors, which possibly have a more profound effect on NMT performance, (e.g. [4]).…”
Section: Discussionmentioning
confidence: 99%
“…Sentence alignment was a popular research topic in the early days of statistical MT, but received less attention once standard sentencealigned parallel corpora became available. Interest in low-resource MT has led to a resurgence in data gathering methods (Buck and Koehn, 2016;Zweigenbaum et al, 2018;, but we find limited recent work on bilingual sentence alignment.…”
Section: Introductionmentioning
confidence: 94%