Proceedings of the Eighth Workshop on Building and Using Comparable Corpora 2015
DOI: 10.18653/v1/w15-3412
|View full text |Cite
|
Sign up to set email alerts
|

AUT Document Alignment Framework for BUCC Workshop Shared Task

Abstract: This paper presents a framework for aligning comparable documents collection. Our feature based model is able to consider different characteristics of documents for evaluating their similarities. The model uses the content of documents while no link, special tag or Metadata are available. And also we apply a filtering mechanism which made our model to be properly applicable for a large collection of data. According to the results, our model is able to recognize related documents in the target language with rec… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
7
0

Year Published

2015
2015
2022
2022

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(7 citation statements)
references
References 26 publications
0
7
0
Order By: Relevance
“…In essence, it uses a bilingual dictionary for converting the word feature vectors between the languages and for estimating their overlap. The other systems are discussed in detail in the proceedings of BUCC'15 (Morin et al 2015;Zafarian et al 2015), and full evaluation results are available there as well (Sharoff, Zweigenbaum and Rapp 2015). The lina system (Morin et al 2015) is based on matching hapax legomena, i.e.…”
Section: Comparison Of Methods Used By Participating Systemsmentioning
confidence: 99%
See 2 more Smart Citations
“…In essence, it uses a bilingual dictionary for converting the word feature vectors between the languages and for estimating their overlap. The other systems are discussed in detail in the proceedings of BUCC'15 (Morin et al 2015;Zafarian et al 2015), and full evaluation results are available there as well (Sharoff, Zweigenbaum and Rapp 2015). The lina system (Morin et al 2015) is based on matching hapax legomena, i.e.…”
Section: Comparison Of Methods Used By Participating Systemsmentioning
confidence: 99%
“…German-English. The aut system (Zafarian et al 2015) uses the most complicated setup by combining several steps. First, documents in different languages are mapped into the same space using a feature transformation matrix.…”
Section: Comparison Of Methods Used By Participating Systemsmentioning
confidence: 99%
See 1 more Smart Citation
“…In essence, it uses a bilingual dictionary for converting the word feature vectors between the languages and estimating their overlap. The other systems are discussed in details in the current proceedings (Morin et al, 2015;Zafarian et al, 2015). The LINA system (Morin et al, 2015) is based on matching hapax legomena, i.e., words occurring only once.…”
Section: Methods Usedmentioning
confidence: 99%
“…In addition to using hapax legomena, the quality of linking in one language pair, e.g., French-English, is also assessed by using information available in pages in another language pair, e.g., German-English. The AUT system (Zafarian et al, 2015) uses the most complicated setup by combining several steps. First, documents in different languages are mapped into the same space using a Table 3: Evaluation results for German feature transformation matrix.…”
Section: Methods Usedmentioning
confidence: 99%