The increasing popularity of crowdsourcing platforms, i.e., Amazon Mechanical Turk, changes how datasets for supervised learning are built. In these cases, instead of having datasets labeled by one source (which is supposed to be an expert who provided the absolute gold standard), databases holding multiple annotators are provided. However, most state-of-the-art methods devoted to learning from multiple experts assume that the labeler’s behavior is homogeneous across the input feature space. Besides, independence constraints are imposed on annotators’ outputs. This paper presents a regularized chained deep neural network to deal with classification tasks from multiple annotators. The introduced method, termed RCDNN, jointly predicts the ground truth label and the annotators’ performance from input space samples. In turn, RCDNN codes interdependencies among the experts by analyzing the layers’ weights and includes l1, l2, and Monte-Carlo Dropout-based regularizers to deal with the over-fitting issue in deep learning models. Obtained results (using both simulated and real-world annotators) demonstrate that RCDNN can deal with multi-labelers scenarios for classification tasks, defeating state-of-the-art techniques.
The labeling process within a supervised learning 1 task is usually carried out by an expert, which provides the 2 ground truth (gold standard) for each sample. However, in many 3 real-world applications, we typically have access to annotations 4 provided by crowds holding different and unknown expertise 5 levels. Learning from crowds intends to configure machine 6 learning paradigms in the presence of multi-labelers, residing on 7 two key assumptions: the labeler's performance does not depend 8 on the input space, and independence among the annotators 9 is imposed. Here, we propose the correlated chained Gaussian 10 processes from multiple annotators-(CCGPMA) approach, which 11 models each annotator's performance as a function of the input 12 space and exploits the correlations among experts. Experimental 13 results associated with classification and regression tasks show 14 that our CCGPMA performs better modeling of the labelers' 15 behaviour, indicating that it consistently outperforms other state-16 of-the-art learning from crowds approaches.
Sentiment analysis is a relevant area in the natural language processing context–(NLP) that allows extracting opinions about different topics such as customer service and political elections. Sentiment analysis is usually carried out through supervised learning approaches and using labeled data. However, obtaining such labels is generally expensive or even infeasible. The above problems can be faced by using models based on self-supervised learning, which aims to deal with various machine learning paradigms in the absence of labels. Accordingly, we propose a self-supervised approach for sentiment analysis in Spanish that comprises a lexicon-based method and a supervised classifier. We test our proposal over three corpora; the first two are labeled datasets, namely, CorpusCine and PaperReviews. Further, we use an unlabeled corpus conformed by news related to the Colombian conflict to understand the university journalistic narrative of the war in Colombia. Obtained results demonstrate that our proposal can deal with sentiment analysis settings in scenarios with unlabeled corpus; in fact, it acquires competitive performance compared with state-of-the-art techniques in partially-labeled datasets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.