State-of-the-art English to Persian Statistical Machine Translation system

Mansouri, Amin; Faili, Heshaam

doi:10.1109/aisp.2012.6313739

Cited by 12 publications

(1 citation statement)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Table 2 shows the statistics of Hamshahri corpus. The 20M parallel corpus is constructed from four different parallel corpora: Roman parallel corpus (Mansouri and Faili 2012), Iran Telecommunication Research Center parallel corpus (Jabbari et al 2012), European Language Resources Association English–Persian parallel corpus (Mosavi Miangah 2009), and a part of Mizan parallel corpus b . This corpus consists of 1,109,584 aligned sentences, which have about 20,000,000 words on each side.…”

Section: Data Sets and Experimental Resultsmentioning

confidence: 99%

A learning to rank approach for cross-language information retrieval exploiting multiple translation resources

2019

Self Cite

View full text Add to dashboard Cite

Cross-language information retrieval (CLIR), finding information in one language in response to queries expressed in another language, has attracted much attention due to the explosive growth of multilingual information in the World Wide Web. One important issue in CLIR is how to apply monolingual information retrieval (IR) methods in cross-lingual environments. Recently, learning to rank (LTR) approach has been successfully employed in different IR tasks. In this paper, we use LTR for CLIR. In order to adapt monolingual LTR techniques in CLIR and pass the barrier of language difference, we map monolingual IR features to CLIR ones using translation information extracted from different translation resources. The performance of CLIR is highly dependent on the size and quality of available bilingual resources. Effective use of available resources is especially important in low-resource language pairs. In this paper, we further propose an LTR-based method for combining translation resources in CLIR. We have studied the effectiveness of the proposed approach using different translation resources. Our results also show that LTR can be used to successfully combine different translation resources to improve the CLIR performance. In the best scenario, the LTR-based combination method improves the performance of single-resource-based CLIR method by 6% in terms of Mean Average Precision.

show abstract

Section: Data Sets and Experimental Resultsmentioning

confidence: 99%

A learning to rank approach for cross-language information retrieval exploiting multiple translation resources

2019

Self Cite

View full text Add to dashboard Cite

show abstract

Modeling Persian Verb Morphology to Improve English-Persian Machine Translation

Mahmoudi

Faili

Arabsorkhi

2013

Advances in Artificial Intelligence and Its Applications

Self Cite

View full text Add to dashboard Cite

Nowadays, dialogue systems are used in many fields of industry and research. There are successful instances of these systems, such as Apple Siri, Google Assistant, and IBM Watson. Task-oriented dialogue system is a category of these, that are used in specific tasks. They can perform tasks such as booking plane tickets or making restaurant reservations. Shopping is one of the most popular areas on these systems. The bot replaces the human salesperson and interacts with the customers by speaking. To train the models behind the scenes of these systems, annotated data is needed. In this paper, we developed a dataset of dialogues in the Persian language through crowd-sourcing. We annotated these dialogues to train a model. This dataset contains nearly 22k utterances in 15 different domains and 1061 dialogues. This is the largest Persian dataset in this field, which is provided freely so that future researchers can use it. Also, we proposed some baseline models for natural language understanding (NLU) tasks. These models perform two tasks for NLU: intent classification and entity extraction. The F-1 score metric obtained for intent classification is around 91% and for entity extraction is around 93%, which can be a baseline for future research.

show abstract

Query-dependent learning to rank for cross-lingual information retrieval

Ghanbari

Shakery

2018

Knowl Inf Syst

View full text Add to dashboard Cite

State-of-the-art English to Persian Statistical Machine Translation system

Cited by 12 publications

References 16 publications

A learning to rank approach for cross-language information retrieval exploiting multiple translation resources

A learning to rank approach for cross-language information retrieval exploiting multiple translation resources

Modeling Persian Verb Morphology to Improve English-Persian Machine Translation

Query-dependent learning to rank for cross-lingual information retrieval

Contact Info

Product

Resources

About