In data mining, classification is the way to splits the data into several dependent and independent regions and each region refer as a class. There are different kinds of classifier uses to accomplish classification task. Moreover classification is bounded in case of classifying of text documents. The motives of the work which a present in the article is to evaluate multiclass document classification and to learn achieve accuracy of classification in the case of text documents. Naive Bayes approach is used to deal with the problem of document classification via a deceptively simplistic model. The Naive Bayes approach is applied in Flat (linear) and hierarchical manner for improving the efficiency of classification model. It has been found that Hierarchical Classification technique is more effective than Flat classification. It also performs better in case of multi-label document classification. In contrast to retrospect we observe significant increase in the generation of data each day. And hence with the advent of smarter technologies, data is required to be classified and sorted before framing out decisions from it. There are so many techniques available for classifying documents into various categories or labels. Data mining is the process of non-trivial extraction of novel, implicit, and actionable knowledge from large data sets.
Abu Nowshed CHY †a) , Md Zia ULLAH †b) , Nonmembers, and Masaki AONO †c) , Member SUMMARYMicroblog, especially twitter, has become an integral part of our daily life for searching latest news and events information. Due to the short length characteristics of tweets and frequent use of unconventional abbreviations, content-relevance based search cannot satisfy user's information need. Recent research has shown that considering temporal and contextual aspects in this regard has improved the retrieval performance significantly. In this paper, we focus on microblog retrieval, emphasizing the alleviation of the vocabulary mismatch, and the leverage of the temporal (e.g., recency and burst nature) and contextual characteristics of tweets. To address the temporal and contextual aspect of tweets, we propose new features based on query-tweet time, word embedding, and query-tweet sentiment correlation. We also introduce some popularity features to estimate the importance of a tweet. A three-stage query expansion technique is applied to improve the relevancy of tweets. Moreover, to determine the temporal and sentiment sensitivity of a query, we introduce query type determination techniques. After supervised feature selection, we apply random forest as a feature ranking method to estimate the importance of selected features. Then, we make use of ensemble of learning to rank (L2R) framework to estimate the relevance of query-tweet pair. We conducted experiments on TREC Microblog 2011 and 2012 test collections over the TREC Tweets2011 corpus. Experimental results demonstrate the effectiveness of our method over the baseline and known related works in terms of precision at 30 (P@30), mean average precision (MAP), normalized discounted cumulative gain at 30 (NDCG@30), and R-precision (R-Prec) metrics. key words: microblog search, temporal information retrieval, query expansion, feature selection, learning to rank, time-aware ranking IntroductionNowadays, microblog web sites are not only the places in maintaining the social relationships, but also act as a valuable information source. Everyday lots of users turn into microblog sites for sharing their views, opinions, experiences, important news, and also want to get some information what is happening around the world. Among several microblog sites, Twitter * is now the most popular, where lots of users post tweets whenever a notable event occurs. That is why; information retrieval in twitter has made a hit with a lot of complaisance. By searching tweets, users find temporally relevant information, such as breaking news and real-time events [1]. That means, freshness (i.e. recency) of the tweet with respect to query time is an important factor of rele- vance. Another important characteristic of twitter is that people tends to post about a topic within a specific period of time (i.e. bursty nature). For example, when the breakup news of famous band "White Stripes" published on 2nd Feb, 2011, many people post tweets about this topic on that day. That is why; posts that are generate...
Stance detection in twitter aims at mining user stances expressed in a tweet towards a single or multiple target entities. To tackle this problem, most of the prior studies have been explored the traditional deep learning models, e.g., LSTM and GRU. However, in compared to these traditional approaches, recently proposed densely connected Bi-LSTM and nested LSTMs architectures effectively address the vanishing-gradient and overfitting problems as well as dealing with long-term dependencies. In this paper, we propose a neural ensemble model that adopts the strengths of these two LSTM variants to learn better long-term dependencies, where each module coupled with an attention mechanism that amplifies the contribution of important elements in the final representation. We also employ a multi-kernel convolution on top of them to extract the higher-level tweet representations. Results of extensive experiments on single and multi-target stance detection datasets show that our proposed method achieves substantial improvement over the current state-ofthe-art deep learning based methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.