One-class classification (OCC) algorithms aim to build classification models when the negative class is either absent, poorly sampled or not well defined. This unique situation constrains the learning of efficient classifiers by defining class boundary just with the knowledge of positive class. The OCC problem has been considered and applied under many research themes, such as outlier/ novelty detection and concept learning. In this paper, we present a unified view of the general problem of OCC by presenting a taxonomy of study for OCC problems, which is based on the availability of training data, algorithms used and the application domains applied. We further delve into each of the categories of the proposed taxonomy and present a comprehensive literature review of the OCC algorithms, techniques and methodologies with a focus on their significance, limitations and applications. We conclude our paper by discussing some open research problems in the field of OCC and present our vision for future research.
Abstract. The One Class Classification (OCC) problem is different from the conventional binary/multi-class classification problem in the sense that in OCC, the negative class is either not present or not properly sampled. The problem of classifying positive (or target) cases in the absence of appropriately-characterized negative cases (or outliers) has gained increasing attention in recent years. Researchers have addressed the task of OCC by using different methodologies in a variety of application domains. In this paper we formulate a taxonomy with three main categories based on the way OCC has been envisaged, implemented and applied by various researchers in different application domains. We also present a survey of current state-of-the-art OCC algorithms, their importance, applications and limitations.
A fall is an abnormal activity that occurs rarely; however, missing to identify falls can have serious health and safety implications on an individual. Due to the rarity of occurrence of falls, there may be insufficient or no training data available for them. Therefore, standard supervised machine learning methods may not be directly applied to handle this problem. In this paper, we present a taxonomy for the study of fall detection from the perspective of availability of fall data. The proposed taxonomy is independent of the type of sensors used and specific feature extraction/selection methods. The taxonomy identifies different categories of classification methods for the study of fall detection based on the availability of their data during training the classifiers. Then, we present a comprehensive literature review within those categories and identify the approach of treating a fall as an abnormal activity to be a plausible research direction. We conclude our paper by discussing several open research problems in the field and pointers for future research.
Mixed data comprises both numeric and categorical features, and mixed datasets occur frequently in many domains, such as health, finance, and marketing. Clustering is often applied to mixed datasets to find structures and to group similar objects for further analysis. However, clustering mixed data are challenging because it is difficult to directly apply mathematical operations, such as summation or averaging, to the feature values of these datasets. In this paper, we present a taxonomy for the study of mixed data clustering algorithms by identifying five major research themes. We then present the state-of-the-art review of the research works within each research theme. We analyze the strengths and weaknesses of these methods with pointers for future research directions. At last, we present an in-depth analysis of the overall challenges in this field, highlight open research questions, and discuss guidelines to make progress in the field. INDEX TERMS Categorical features, clustering, mixed datasets, numeric features. I. INTRODUCTION Clustering is an unsupervised machine learning technique used to group unlabeled data into clusters that contain data points that are 'similar' to each other and 'dissimilar' from those in other clusters [1], [2]. Many clustering algorithms can only handle data that contain either numeric or categorical feature values [3], [4]. Numeric features can take real values, such as height, weight, and distance. Categorical features represent data that can be divided into a fixed number of categories, such as color, race, sex, profession, and blood group. Clustering algorithms group data points into clusters using some notion of 'similarity', which can be as simple as the Euclidean distance. To compute the similarity between numeric feature values, mathematical operations (such as distances, angles, summation, or mean) are applied to them. Distance-based similarity measures are mostly used for numeric data points. Generally, categorical feature values are not inherently ordered (for example, the categorical values, red and blue). It is not possible to directly compute the distance between two categorical feature values. Therefore, computing distance-based similarity measures for categorical data is a challenging task [5]. Nevertheless, several methods The associate editor coordinating the review of this manuscript and approving it for publication was Haruna Chiroma.
a b s t r a c tPartitional clustering of categorical data is normally performed by using K-modes clustering algorithm, which works well for large datasets. Even though the design and implementation of K-modes algorithm is simple and efficient, it has the pitfall of randomly choosing the initial cluster centers for invoking every new execution that may lead to non-repeatable clustering results. This paper addresses the randomized center initialization problem of K-modes algorithm by proposing a cluster center initialization algorithm. The proposed algorithm performs multiple clustering of the data based on attribute values in different attributes and yields deterministic modes that are to be used as initial cluster centers. In the paper, we propose a new method for selecting the most relevant attributes, namely Prominent attributes, compare it with another existing method to find Significant attributes for unsupervised learning, and perform multiple clustering of data to find initial cluster centers. The proposed algorithm ensures fixed initial cluster centers and thus repeatable clustering results. The worst-case time complexity of the proposed algorithm is log-linear to the number of data objects. We evaluate the proposed algorithm on several categorical datasets and compared it against random initialization and two other initialization methods, and show that the proposed method performs better in terms of accuracy and time complexity. The initial cluster centers computed by the proposed approach are close to the actual cluster centers of the different data we tested, which leads to faster convergence of K-modes clustering algorithm in conjunction to better clustering results.
Agitation and aggression are among the most challenging symptoms of dementia. Agitated persons with dementia can harm themselves, their caregivers, or other patients in a care facility. Automatic detection of agitation would be useful to alert caregivers so that appropriate interventions can be performed. The building blocks in the automatic detection of agitation and aggression are appropriate sensing platforms and generalized predictive models. In this article, we perform a systematic review of studies that use different types of sensors to detect agitation and aggression in persons with dementia. We conclude that actigraphy shows some evidence of correlation with incidences of agitation and aggression; however, multimodal sensing has not been fully evaluated for this purpose. Based on this systematic review, we provide guidelines and recommendations for future research directions in this field.
Human falls rarely occur; however, detecting falls is very important from the health and safety perspective. Due to the rarity of falls, it is difficult to employ supervised classification techniques to detect them. Moreover, in these highly skewed situations it is also difficult to extract domain specific features to identify falls. In this paper, we present a novel framework, DeepFall, which formulates the fall detection problem as an anomaly detection problem. The DeepFall framework presents the novel use of deep spatio-temporal convolutional autoencoders to learn spatial and temporal features from normal activities using non-invasive sensing modalities. We also present a new anomaly scoring method that combines the reconstruction score of frames across a video sequences to detect unseen falls. We tested the DeepFall framework on three publicly available datasets collected through non-invasive sensing modalities, thermal camera and depth cameras and show superior results in comparison to traditional autoencoder and convolutional autoencoder methods to identify unseen falls.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.