Sentiment analysis automatically evaluates people's opinions of products or services. It is an emerging research area with promising advancements in high-resource languages such as Indo-European languages (e.g. English). However, the same cannot be said for languages with limited resources. In this study, we evaluate multilingual sentiment analysis (MSA) techniques for under-resourced languages and the use of high-resourced languages to develop resources for MSA in low-resource languages, with the ultimate goal of identifying appropriate strategies for future MSA investigations. We report over 35 studies with different languages demonstrating an interest in developing MSA models for under-resourced languages in a multilingual context. Furthermore, we illustrate the drawbacks of each strategy used for the MSA task. Our focus is critically comparing MSA methods and employed datasets and identifying research gaps. Our comparative analysis study contributes to theoretical literature reviews with complete coverage of MSA studies from 2008 to date. Furthermore, we demonstrate how MSA studies have grown tremendously. Finally, because most studies propose MSA methods based on deep learning approaches, we offer a deep learning framework for MSA that does not rely on machine translation systems. According to the metaanalysis (PRISMA) protocol of this literature review, we found that, in general, just over 60% of the studies have used deep learning frameworks, which significantly improved the MSA performance. Therefore, deep learning methods are recommended for the development of MSA for under-resourced languages.
Automatic language identification (LID) is a specialized area of Human Language Technology in which the language(s) used in spoken utterances are identified and correctly classified given a predetermined number of targeted languages. Currently, most multilingual speakers have the ability and tendency for engaging in code-switching -a mixed-language phenomenon that is referred to as the usage of more than one language in utterances. This paper presents the proposed scheme for automatic language identification integrated with an automatic speech recognition system to identify languages used in a mixed-language speech context. The front-end speech recognition system feeds the decoded phonemes into the LID system. We used hidden Markov models to build acoustic models of a combined phoneme set that handles multiple languages within
an utterance. A spoken utterance is converted into feature vectors with attributes that represents the statistical occurrences of each acoustic units. A supervised support vector machine (SVM) technique is trained with feature vector sequences of phoneme units. The back-end SVM classifier based on n-gramstructures is used to classify/identify the phoneme feature vectors. We conducted experiments with two commonly mixed Northern Sotho and English telephone-based speech corpora. The experimental results showed that, by using shared phonemic vowels in the combined phoneme set, the word error rate (WER) was reduced with 3.6%. Moreover, the proposed approach yields significantly acceptable performance with language identification rate of 85.0% on code-switched speech corpus.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.