ii Foreword Assalamu 3alaykum wa nín hǎo! Welcome to the Second Arabic Natural Language Processing Workshop held at ACL 2015 in Beijing, China.A number of Arabic NLP (or Arabic NLP-related) workshops and conferences have taken place, both in the Arab World and in association with international conferences. The Arabic NLP workshop at ACL 2015 follows in the footsteps of these previous efforts to provide a forum for researchers to share and discuss their ongoing work. As in the first Arabic NLP workshop held at EMNLP 2014 in Doha, Qatar, this workshop includes a shared task on Automatic Arabic Error Correction, which was designed in the tradition of high profile NLP shared tasks such as CONLL's grammar/error detection and numerous machine translation campaigns by NIST/WMT/MEDAR, among others.We received 23 main workshop submissions and selected 15 (65%) for presentation in the workshop. Nine papers will be presented orally and six as part of a poster session. The presentation mode is independent of of the ranking of the papers. The papers cover a diverse set of topics from designing orthography conventions and annotation tools to speech recognition and deep learning for sentiment analysis.The shared task was a success with eight teams from six countries participating. The shared task system descriptions (short) papers are included in the proceedings to document the shared task systems, but were not reviewed with the rest of the papers of the main workshop. These papers will be presented as posters. A long paper describing the shared task will be presented orally.The quantity and quality of the contributions to the main workshop, as well as the shared task, are strong indicators that there is a continued need for this kind of dedicated Arabic NLP workshop.We would like to acknowledge all the hard work of the submitting authors and thank the reviewers for their diligent work and for the valuable feedback they provided. We are also thankful to the work of the shared task committee, website committee and the publication co-chairs. It has been an honor to serve as program co-chairs. We hope that the reader of these proceedings will find them stimulating and beneficial. AbstractDifferent names may be popular in different countries. Hence, person names may give a clue to a person's country of origin. Along with other features, mapping names to countries can be helpful in a variety of applications such as country tagging twitter users. This paper describes the collection of Arabic Twitter user names that are either written in Arabic or transliterated into Latin characters along with their stated geographical locations. To classify previously unseen names, we trained naive Bayes and Support Vector Machine (SVM) multi-class classifiers using primarily bag-of-words features. We are able to map Arabic user names to specific Arab countries with 79% accuracy and to specific regions (Gulf, Egypt, Levant, Maghreb, and others) with 94% accuracy. As for transliterated Arabic names, the accuracy per country and per region was 67...
In this work, we present Qatar Computing Research Institute»s live speech translation system. Our system works with both Arabic and English. It is designed using an array of modern web technologies to capture speech in real time, and transcribe and translate it using state-of-the-art Automatic Speech Recognition (ASR) and Machine Translation (MT) systems. The platform is designed to be useful in a wide variety of situations like lectures, talks and meetings. It is often the case in the Middle East that audiences in talks understand either Arabic or English alone. This system enables the speaker to talk in either language, and the audience to understand what is being spoken even if they are not bilingual.The system consists of three primary modules, i) a Web application, ii) ASR system, iii) and a statistical/neural MT system. The three modules are optimized to work jointly and process the speech at a real-time factor close to one - which means that the systems are optimized to keep up with the speaker and provide the results with a short delay, comparable to what we observe in (human) interpretation. The real-time factor for the entire pipeline is 1.18. The Web application is based on the standard HTML5 WebAudio application programming interface. It captures speech input from a microphone on the user»s device and transmits it to the backend servers for processing. The servers send back the transcriptions and translations of the speech, which is then displayed to the user. Our platform features a way to instantly broadcast live sessions for anyone to see the transcriptions and translations of a session in real-time without being physically present at the speaker»s location. The ASR system is based on KALDI, a state-of-the-art toolkit for speech recognition. We use a combination of time delay neural networks (TDNN) and long-short term memory neural network (LSTM) to ensure real time transcription of the incoming speech while ensuring high quality output. The Arabic and English systems have average word error rates of 23% and 9.7% respectively. The Arabic system consists of the following components: i) a character based lexicon of size 900K; the lexicon maps words to sound units to learn acoustic representation, ii) 40 dimensional high-resolution features extracted for each speech frame to digitize the audio signal, iii) a 100-dimensional i-vectors for each frame to facilitate speaker adaptation, iv) TDNN acoustic models, and v) Tri-gram language model trained using 110 M words, and restricted to 900 K vocabulary.The MT system has two choices for the backend – a statistical phrase-based system and a neural MT system. Our phrase-based system is trained with Moses, a state-of-the-art statistical MT framework, and the neural-based systems is trained with Nematus, a state-of-the-art neural MT framework. We use Modified Moore-Lewis filtering to select the best subset of the available data to train our phrase-based system more efficiently. In order to speed up the translation even further, we prune the language models backing the phrase-based system, ignoring knowledge that is not frequently used. On the other hand, our neural-based system MT system trained on all the available data as its training scales linearly with the amount of data unlike phrase-based systems. Our Neural MT system is roughly 3–5% better on the BLEU scale, a standard measure for computing the quality of translations. However, the existing neural MT decoders are slower than the phrase-based decoders translating 9.5 tokens/second versus 24 tokens/second. The trade-off between efficiency and accuracy barred us from picking only one final system. By enabling both technologies we allow the trade-off between quality and efficiency and leave it up to the user to decide whether they prefer fast or accurate system.Our system has been successfully demonstrated locally and globally at several venues like Al Jazeera, MIT, BBC and TII. The state-of-the-art technologies backing the platform for transcription and translation are also available independently and can be integrated seamlessly into any external platform. The Speech Translation system is publicly available at http://st.qcri.org/demos/livetranslation.
This research is devoted to study the performance of Iraqi EFL learners with reference to idiomatic expressions in modern Standard English.By definition , an idiom is a linguistic unit in which the meaning of a given construction cannot be understood from the words that compose it. Evidence shows that English idiomatic expressions represent a rather problematic area for EFL learners . Thus , this study aims at theoretically investigating the English idiomatic expressions and practically studying the Iraqi EFL learners' performance in dealing with these expressions by means of a specialized test designed for this purpose . It is hypothesized that Iraqi EFL learners face difficulties in dealing with idiomatic expressions in both recognition and production levels . The test includes two questions , each with 25 items. Results show that , at the recognition level ( question 1) , most Iraqi EFL learners fail in recognizing the idiomatic expressions , whereas at the recognition level the percentage of learners unable to employ the given idiomatic expressions increased including the avoided items , which are considered as incorrect in both first and second questions . The findings arrived at support the above mentioned hypothesis . The study ends with some of the recommendations for EFL teachers and learners and the way they encounter such expressions in spoken and written English
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.