Machine translation (MT) is the automatic translation of the source language to its target language by a computer system. In the current paper, we propose an approach of using recurrent neural networks (RNNs) over traditional statistical MT (SMT). We compare the performance of the phrase table of SMT to the performance of the proposed RNN and in turn improve the quality of the MT output. This work has been done as a part of the shared task problem provided by the MTIL2017. We have constructed the traditional MT model using Moses toolkit and have additionally enriched the language model using external data sets. Thereafter, we have ranked the phrase tables using an RNN encoder-decoder module created originally as a part of the GroundHog project of LISA lab.
Large aligned corpora are required for any computer aided translation system to become effective. In this scenario, bilingual document alignment has gained utmost importance in recent days. We attempt a simple yet effective approach to align URLs (Uniform Resource Locator) within two documents in two languages as a part of WMT2016 Bilingual Document Alignment Shared Task. Our approach includes the processing of URLs and their embedded texts, which serves as the main matching criterion. In order to align the text initially, we have used GaleChurch algorithm, dictionary based translation and Cosine Similarity that in turn helps us to achieve better results in the alignment task.
A Statistical Machine Translation (SMT) system is always trained using large parallel corpus to produce effective translation. Not only is the corpus scarce, it also involves a lot of manual labor and cost. Parallel corpus can be prepared by employing comparable corpora where a pair of corpora is in two different languages pointing to the same domain. In the present work, we try to build a parallel corpus for French-English language pair from a given comparable corpus. The data and the problem set are provided as part of the shared task organized by BUCC 2017. We have proposed a system that first translates the sentences by heavily relying on Moses and then group the sentences based on sentence length similarity. Finally, the one to one sentence selection was done based on Cosine Similarity algorithm.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.