Twitter is a microblogging platform in which users can post status messages, called “tweets,” to their friends. It has provided an enormous dataset of the so-called sentiments, whose classification can take place through supervised learning. To build supervised learning models, classification algorithms require a set of representative labeled data. However, labeled data are usually difficult and expensive to obtain, which motivates the interest in semi-supervised learning. This type of learning uses unlabeled data to complement the information provided by the labeled data in the training process; therefore, it is particularly useful in applications including tweet sentiment analysis, where a huge quantity of unlabeled data is accessible. Semi-supervised learning for tweet sentiment analysis, although appealing, is relatively new. We provide a comprehensive survey of semi-supervised approaches applied to tweet classification. Such approaches consist of graph-based, wrapper-based, and topic-based methods. A comparative study of algorithms based on self-training, co-training, topic modeling, and distant supervision highlights their biases and sheds light on aspects that the practitioner should consider in real-world applications.
Abstract-The goal of sentiment analysis is to determine opinions, emotions, and attitudes presented in source material. In tweet sentiment analysis, opinions in messages can be typically categorized as positive or negative. To classify them, researchers have been using traditional classifiers like Naive Bayes, Maximum Entropy, and Support Vector Machines (SVM). In this paper, we show that a SVM classifier combined with a cluster ensemble can offer better classification accuracies than a stand-alone SVM. In our study, we employed an algorithm, named C 3 E-SL, capable to combine classifier and cluster ensembles. This algorithm can refine tweet classifications from additional information provided by clusterers, assuming that similar instances from the same clusters are more likely to share the same class label. The resulting classifier has shown to be competitive with the best results found so far in the literature, thereby suggesting that the studied approach is promising for tweet sentiment classification.
Em busca da qualidade na educação, diversas pesquisas no âmbito educacional são realizadas, no entanto, análise de grandes bases de dados em busca de informações úteis é uma tarefa desafiadora. Nesta pesquisa, foram identificados aspectos relacionados ao desempenho acadêmico dos alunos, utilizando como base as provas do ENADE aplicados ao curso de Ciência da Computação. A pesquisa foi regida pela metodologia que tange o Estudo Longitudinal Transversal Repetido, buscando analisar os dados ao longo do tempo, para verificar seu comportamento nos anos estudados. Os anos analisados foram 2011, 2014 e 2017. Foram aplicadas técnicas de mineração de dados nos microdados fornecidos pelo Instituto Nacional de Estudos e Pesquisas Educacionais Anísio Teixeira. A contribuição desta pesquisa é mostrar que dentre os algoritmos utilizados nos experimentos, o que melhor classificou os dados em relação ao desempenho dos alunos foi a árvore de decisão, a qual possibilitou identificar que algumas características socioeconômicas como por exemplo renda familiar, escolaridade do pai, situação de trabalho do discente em conjunto com a categoria administrativa e o turno de graduação impactam no desempenho acadêmico.
Bee-mediated pollination greatly increases the size and weight of tomato fruits. Therefore, distinguishing between the local set of bees–those that are efficient pollinators–is essential to improve the economic returns for farmers. To achieve this, it is important to know the identity of the visiting bees. Nevertheless, the traditional taxonomic identification of bees is not an easy task, requiring the participation of experts and the use of specialized equipment. Due to these limitations, the development and implementation of new technologies for the automatic recognition of bees become relevant. Hence, we aim to verify the capacity of Machine Learning (ML) algorithms in recognizing the taxonomic identity of visiting bees to tomato flowers based on the characteristics of their buzzing sounds. We compared the performance of the ML algorithms combined with the Mel Frequency Cepstral Coefficients (MFCC) and with classifications based solely on the from fundamental frequency, leading to a direct comparison between the two approaches. In fact, some classifiers powered by the MFCC–especially the SVM–achieved better performance compared to the randomized and sound frequency-based trials. Moreover, the buzzing sounds produced during sonication were more relevant for the taxonomic recognition of bee species than analysis based on flight sounds alone. On the other hand, the ML classifiers performed better in recognizing bees genera based on flight sounds. Despite that, the maximum accuracy obtained here (73.39% by SVM) is still low compared to ML standards. Further studies analyzing larger recording samples, and applying unsupervised learning systems may yield better classification performance. Therefore, ML techniques could be used to automate the taxonomic recognition of flower-visiting bees of the cultivated tomato and other buzz-pollinated crops. This would be an interesting option for farmers and other professionals who have no experience in bee taxonomy but are interested in improving crop yields by increasing pollination.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.