The hybrid Improved monarch butterfly optimization-mutual nearest neighbor (IMBO-MNN) is proposed for outlier detection in high dimensional data. It is a challenge to detect outliers in high dimensional information. The external behavior of the data points cannot be detected in high-dimensional data except in the locally relevant data subsets. Subsets of dimensions are called subspaces, and with an increase in data dimension, the number of those subspaces grows exponentially. In another subspace an information point that is an outlier can appear ordinary. It's essential to assess its outlier behavior according to the amount of subspaces in which it appears as an outermost part to characterize an outlier. Data is scarce in high-dimensional space and the concept of closeness does not preserve meaning. In fact, the sparsity of the high-dimensional data means that every point is nearly equal from the point of view of closeness-based finishes. As a result, for higher dimensional information finding is more complicated and non-obviously significant outliers. An enhanced MBO (IMBO) algorithm is offered for enhanced search precision and run time efficiency by a fresh adaptation provider. Statistical results indicate that the elevated local optimal prevention and quick convergence rate of the improved monarch butterfly optimization (IMBO) algorithm helps to exceed the basic MBOs in outlier detection. Comparatively, IMBO produces very competitive outcomes and tends to surpass present algorithms. Optimal value k remains a task, affecting the efficiency of kNN straightforwardly. We are presenting a fresh learning algorithm under kNN in this paper to alleviate this issue called mutual nearest neighbor (MNN). The main feature of our method is that the class marks of unknown instances are defined by mutually next to one another, instead of by closest neighbor. The advantage of mutual neighbors is that in the course of the prediction process pseudo close neighbors can be identified and taken not into account. The performance of the suggested algorithm has been examined with a number of studies.
AbstractSocial media contain abundant information about the events or news occurring all over the world. Social media growth has a greater impact on various domains like marketing, e-commerce, health care, e-governance, and politics, etc. Currently, Twitter was developed as one of the social media platforms, and now, it is one of the most popular social media platforms. There are 1 billion user’s profiles and millions of active users, who post tweets daily. In this research, buzz detection in social media was carried out by the semantic approach using the condensed nearest neighbor (SACNN). The Twitter and Tom’s Hardware data are stored in the UC Irvine Machine Learning Repository, and this dataset is used in this research for outlier detection. The min–max normalization technique is applied to the social media dataset, and additionally, missing values were replaced by the normalized value. The condensed nearest neighbor (CNN) is used for semantic analysis of the database, and based on the optimized value provided by the proposed method, the threshold is calculated. The threshold value is used to classify buzz and non-buzz discussions in the social media database. The result showed that the SACNN achieved 99% of accuracy, and relative error is less than the existing methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.