Morphologically rich languages generally require large amounts of parallel data to adequately estimate parameters in a statistical Machine Translation(SMT) system. However, it is time consuming and expensive to create large collections of parallel data. In this paper, we explore two strategies for circumventing sparsity caused by lack of large parallel corpora. First, we explore the use of distributed representations in an Recurrent Neural Network based language model with different morphological features and second, we explore the use of lexical resources such as WordNet to overcome sparsity of content words.
The paper investigates the use of semantic similarity scores as feature in the phrase based machine translation system. We propose the use of partial least square regression to learn the bilingual word embedding using compositional distributional semantics. The model outperforms the baseline system which is shown by an increase in BLEU score. We also show the effect of varying the vector dimension and context window for two different approaches of learning word vectors.
We present our efforts on studying the effect of transliteration, on the human readability. We have tried to explore the effect by studying the changes in the eye-gaze patterns, which are recorded with an eye-tracker during experimentation. We have chosen Hindi and English languages, written in Devanagari and Latin scripts respectively. The participants of the experiments are subjected to transliterated words and asked to speak the word. During this, their eye movements are recorded. The eye-tracking data is later analyzed for eye-fixation trends. Quantitative analysis of fixation count and duration as well as visit count is performed over the areas of interest.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.