Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers 2016
DOI: 10.18653/v1/w16-2367
|View full text |Cite
|
Sign up to set email alerts
|

Bitextor's participation in WMT'16: shared task on document alignment

Abstract: This paper describes the participation of Prompsit Language Engineering and the Universitat d'Alacant in the shared task on document alignment at the First Conference on Machine Translation (WMT 2016). Two systems have been submitted, corresponding to two different versions of the tool Bitextor: the last stable release, version 4.1, and the newest one, version 5.0. The paper describes the main features of each version of the tool and discusses the results obtained on the data sets published for the shared task. Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
14
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
6
3
1

Relationship

2
8

Authors

Journals

citations
Cited by 15 publications
(15 citation statements)
references
References 8 publications
1
14
0
Order By: Relevance
“…This paper proposes a collection of features that builds on those defined by Esplà-Gomis et al [13] and can be obtained from any source of bilingual information: in our experiments, online MT systems and an online bilingual concordancer were used. The results obtained on the datasets published for the word-level MT QE shared tasks at WMT15 and WMT16 confirm the good performance of the approach proposed, which is able to reproduce or even improve on the results obtained by Esplà-Gomis et al [13] and Esplà-Gomis et al [51].…”
Section: Discussionsupporting
confidence: 77%
“…This paper proposes a collection of features that builds on those defined by Esplà-Gomis et al [13] and can be obtained from any source of bilingual information: in our experiments, online MT systems and an online bilingual concordancer were used. The results obtained on the datasets published for the word-level MT QE shared tasks at WMT15 and WMT16 confirm the good performance of the approach proposed, which is able to reproduce or even improve on the results obtained by Esplà-Gomis et al [13] and Esplà-Gomis et al [51].…”
Section: Discussionsupporting
confidence: 77%
“…The vectors are then typically matched with cosine similarity (Buck and Koehn, 2016a). The raw vectors may be recentered around the mean vector for a web domain (Germann, 2016) Document alignment quality can be improved with additional features such ratio of shared links, similarity of link URLs, ratio of shared images, binary feature indicating if the documents are linked, DOM structure similarity (Esplà-Gomis et al, 2016), same numbers (Papavassiliou et al, 2016), or same named entities (Lohar et al, 2016). Guo et al (2019) introduce the use of document embeddings, constructed from sentence embeddings, to the document alignment task.…”
Section: Document Alignmentmentioning
confidence: 99%
“…Previous methods depended on engineering features. (Shi et al, 2006;Esplà-Gomis et al, 2016) used metadata information from web crawls to mine parallel data. Recent methods used crosslingual word embeddings to obtain parallel corpora (Guo et al, 2018;Schwenk, 2018;Bouamor and Sajjad, 2018;Schwenk et al, 2019b,a).…”
Section: Related Workmentioning
confidence: 99%