Samir Malakar scite author profile

Outlier or anomaly detection is the process through which datum/data with different properties from the rest of the data is/are identified. Their importance lies in their use in various domains such as fraud detection, network intrusion detection, and spam filtering. In this paper, we introduce a new outlier detection algorithm based on an ensemble method and distance-based data filtering with an iterative approach to detect outliers in unlabeled data. The ensemble method is used to cluster the unlabeled data and to filter out potential isolated outliers from the same by iteratively using a cluster membership threshold until the Dunn index score for clustering is maximized. The distance-based data filtering, on the other hand, removes the potential outlier clusters from the post-clustered data based on a distance threshold using the Euclidean distance measure of each data point from the majority cluster as the filtering factor. The performance of our algorithm is evaluated by applying it to 10 real-world machine learning datasets. Finally, we compare the results of our algorithm to various supervised and unsupervised outlier detection algorithms using Precision@n and F-score evaluation metrics.

show abstract

Bi-Level Prediction Model for Screening COVID-19 Patients Using Chest X-Ray Images

Das

Roy

Malakar³

et al. 2021

Big Data Research

View full text Add to dashboard Cite

Development of a Two-Stage Segmentation-Based Word Searching Method for Handwritten Document Images

Malakar¹,

Ghosh

Sarkar

et al. 2018

View full text Add to dashboard Cite

AbstractWord searching or keyword spotting is an important research problem in the domain of document image processing. The solution to the said problem for handwritten documents is more challenging than for printed ones. In this work, a two-stage word searching schema is introduced. In the first stage, all the irrelevant words with respect to a search word are filtered out from the document page image. This is carried out using a zonal feature vector, called pre-selection feature vector, along with a rule-based binary classification method. In the next step, a holistic word recognition paradigm is used to confirm a pre-selected word as search word. To accomplish this, a modified histogram of oriented gradients-based feature descriptor is combined with a topological feature vector. This method is experimented on a QUWI English database, which is freely available through the International Conference on Document Analysis and Recognition 2015 competition entitled “Writer Identification and Gender Classification.” This technique not only provides good retrieval performance in terms of recall, precision, and F-measure scores, but it also outperforms some state-of-the-art methods.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Samir Malakar

A GA based hierarchical feature selection approach for handwritten word recognition

Off-line Bangla handwritten word recognition: a holistic approach

An iterative approach to unsupervised outlier detection using ensemble method and distance-based data filtering

Bi-Level Prediction Model for Screening COVID-19 Patients Using Chest X-Ray Images

Development of a Two-Stage Segmentation-Based Word Searching Method for Handwritten Document Images

Contact Info

Product

Resources

About