Computer Science &Amp; Information Technology (CS &Amp; IT) 2020
DOI: 10.5121/csit.2020.101521
|View full text |Cite
|
Sign up to set email alerts
|

Parallel Data Extraction using Word Embeddings

Abstract: Building a robust MT system requires a sufficiently large parallel corpus to be available as training data. In this paper, we propose to automatically extract parallel sentences from comparable corpora without using any MT system or even any parallel corpus at all. Instead, we use crosslingual information retrieval (CLIR), average word embeddings, text similarity and a bilingual dictionary, thus saving a significant amount of time and effort as no MT system is involved in this process. We conduct experiments o… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 14 publications
(11 reference statements)
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?