The wealth of opinions available in the social media motivated researchers to develop automatic opinion detection tools. Many such tools are currently available online for opinion mining in short text, known as micro-blogs, but their efficacies are still limited. Current tools focus on detecting sentiment polarity expressed in a micro-blog regardless of the topic (target) discussed. Little improved approaches have been proposed to detect sentiment towards a specific target, referred to as target-dependent sentiment classification. Our literature review has shown that all these target-dependent approaches use supervised learning techniques. Such techniques need a huge amount of labeled data for increasing classification accuracy. However, preparing labeled data from social media needs a lot of efforts. In this work, we address this issue by employing semisupervised learning techniques that have not been used before with target-dependent sentiment classification. To the best of our knowledge, our work is the first research that employs semisupervised learning techniques in this direction. Semi-supervised learning techniques have been known in the literature to improve classification accuracy in comparison with supervised learning techniques; however, they use same number of labeled samples plus many unlabelled ones. In this work, we propose a new semi-supervised learning technique that uses less number of labeled microblogs than that used with supervised learning techniques. Experiment results have shown that the proposed technique provides competitive accuracy.
Abstract:In this paper, we describe an essential problem in data clustering and present some solutions for it. We investigated using distance measures other than Euclidean type for improving the performance of clustering. We also developed an improved point symmetry-based distance measure and proved its efficiency. We developed a k-means algorithm with a novel distance measure that improves the performance of the classical k-means algorithm. The proposed algorithm does not have the worst-case bound on running time that exists in many similar algorithms in the literature.Experimental results shown in this paper demonstrate the effectiveness of the proposed algorithm. We compared the proposed algorithm with the classical k-means algorithm. We presented the proposed algorithm and their performance results in detail along with avenues of future research.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.