Twitter sentiment analysis is a challenging problem in natural language processing. For this purpose, supervised learning techniques have mostly been employed, which require labeled data for training. However, it is very time consuming to label datasets of large size. To address this issue, unsupervised learning techniques such as clustering can be used. In this study, we explore the possibility of using hierarchical clustering for twitter sentiment analysis. Three hierarchical-clustering techniques, namely single linkage (SL), complete linkage (CL) and average linkage (AL), are examined. A cooperative framework of SL, CL and AL is built to select the optimal cluster for tweets wherein the notion of optimal-cluster selection is operationalized using majority voting. The hierarchical clustering techniques are also compared with k-means and two state-of-the-art classifiers (SVM and Naïve Bayes). The performance of clustering and classification is measured in terms of accuracy and time efficiency. The experimental results indicate that cooperative clustering based on majority voting approach is robust in terms of good quality clusters with tradeoff of poor time efficiency. The results also suggest that the accuracy of the proposed clustering framework is comparable to classifiers which is encouraging. INDEX TERMS Cooperative clustering, majority voting, sentiment analysis, twitter sentiment analysis.
Twitter sentiment analysis is a challenging task that involves various preprocessing steps including dimensionality reduction. Dimensionality reduction helps ensure low computational complexity and performance improvement during the classification process. In Twitter data, each tweet has feature values which may or may not reflect a person's response. Therefore, a large number of sparse data points are generated when tweets are represented as feature matrix, eventually increasing computational overheads and error rates in Twitter sentiment analysis. This study proposes a novel preprocessing technique called class association and attribute relevancy based imputation algorithm (CAARIA) to reduce the Twitter data size. CAARIA achieves the dimensionality reduction goal by imputing those tweets that belong to the same class and also share useful information. The performance of two classifiers (Naïve Bayes and support vector machines) is evaluated on three Twitter datasets in terms of classification accuracy, measured as area under curve, and time efficiency. CAARIA is also compared against two widely used feature selection (dimensionality reduction) techniques, information gain (IG) and Pearson's correlation (PC). The findings reveal that CAARIA outperforms IG and PC in terms of classification accuracy and time efficiency. These results suggest that CAARIA is a robust data preprocessing technique for the classification task.
Maintenance of architectural documentation is a prime requirement for evolving software systems. New versions of software systems are launched after making the changes that take place in a software system over time. The orphan adoption problem, which deals with the issue of accommodation of newly introduced resources (orphan resources) in appropriate subsystems in successive versions of a software system, is a significant problem. The orphan adoption algorithm has been developed to address this problem. For evolving software systems, it would be useful to recover the architecture of subsequent versions of a software system by using existing architectural information. In this paper, we explore supervised learning techniques (classifiers) for recovering the architecture of subsequent versions of a software system by taking benefit of existing architectural information. We use three classifiers, i.e., Bayesian classifier, k-Nearest Neighbor classifier and Neural Network for orphan adoption. We conduct experiments to compare the performance of the classifiers using various dependencies between entities in a software system. Our experiments highlight correspondence between the orphan adoption algorithm and the classifiers, and also reveal their strengths and weaknesses. To combine strengths of individual classifiers, we propose using a multiclassifier approach in which classifiers work cooperatively to improve classification accuracy. Experiments show that there is significant improvement in results when our proposed multiclassifier approach is used.
Vehicular Ad-hoc network (VANET) is an imminent technology having both exciting prospects and substantial challenges, especially in terms of security. Due to its distributed network and frequently changing topology, it is extremely prone to security attacks. The researchers have proposed different strategies for detecting various forms of network attacks. However, VANET is still exposed to several attacks, specifically Sybil attack. Sybil Attack is one of the most challenging attacks in VANETS, which forge false identities in the network to undermine communication between network nodes. This attack highly impacts transportation safety services and may create traffic congestion. In this regard, a novel collaborative framework based on majority voting is proposed to detect the Sybil attack in the network. The framework works by ensembling individual classifiers, i.e., K-Nearest Neighbor, Naïve Bayes, Decision Tree, SVM, and Logistic Regression in a parallel manner. The Majority Voting (Hard and Soft) mechanism is adopted for a final prediction. A comparison is made between Majority Voting Hard and soft to choose the best approach. With the proposed approach, 95% accuracy is achieved. The proposed framework is also evaluated using the Receiver operating characteristics curve (ROC-curve).
Glycoproteins play an important and ubiquitous role in many biological processes such as protein folding, cell-to-cell signaling, invading microorganism infection, tumor metastasis, and leukocyte trafficking. The key mechanism of glycoproteins must be revealed to model and refine glycosylated protein recognition, which will eventually assist in the design and discovery of carbohydrate-derived therapeutics. Experimental procedures involving wet-lab experiments to reveal glycoproteins are very time-consuming, laborious, and highly costly. However, costly and tedious experimental procedures can be assisted by ranking the most probable glycoproteins through computational methods with improved accuracy. In this study, we have proposed a novel machine learning-based predictive model for glycoproteins identification. Our proposed model is based on sequence-derived structural descriptors (SDSD) that fill the gap of unavailability of protein 3D structures and lack of accuracy in sequence information alone. Through a series of simulation studies, we have shown that our proposed model gives state-of-the-art generalization performance verified through various machine learning-centric and biologically relevant techniques and metrics. Through data mining in this study, we have also identified the role of descriptors in determining glycoproteins. Python-based standalone code together with a webserver implementation of our proposed model (COYOTE: identifiCation Of glYcoprOteins Through sEquences) is available at the URL: https://sites.google.com/view/wajidarshad/software .
Social capital is a very important facet of society and has strong relevance to economic landscape of a country. There are different theories about the nature, accumulation, growth and validity of social capital as an instrument of economy. This paper explains the philosophical context of social capital and validate through a model using Berg, Dickhaut and McCabe trust game that we all transfer a set of values to our next generation, which ultimately manifest as social capital in the real world. The transferred values affects each agent‟ decision whether to trust other members of the society and participate in a socio-economic exchange or not. If he trusts, he reaps substantial gains from exchange, and ultimately social capital will be concreted. But if he does not, he will face a major loss and overall social capital will be dented. A distrustful ancestor will further induce agents to withdraw from the market and not to invest. This will lead a society to an overall mistrust paradigm and eventual downfall. On the other hand, the level of cooperation in the members of society increases as the good experience is transferred across generations. Economic impacts of social capital have been elucidated using Leonhard Euler function and Newton Leibniz integration processes. The ethical framework for society and role of literature in transfer of beliefs, values and culture from one generation to other has also been discussed at length. The paper is a unique study of how we all will be responsible for the kind of social capital will have in 2025. JEL Classification: D71, D72, Z11, Z12, Z13 Keywords: Social, Capital, Model, Transfer, Generations, Values, Trust, Beliefs, Cooperation, Ethics, Literature
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.