Ya-Ju Fan scite author profile

This paper proposes a new classification technique, called support feature machine (SFM), for multidimensional time-series data. The proposed technique was applied to the classification of abnormal brain activity represented in electroencephalograms (EEGs). First, the dynamical properties of EEGs from each electrode were extracted. These dynamical profiles were put in SFM, which is an optimization model that maximizes classification accuracy by selecting electrodes (features) that correctly classify unlabeled EEG samples based on the nearest-neighbor classification rule. The empirical studies were performed on the EEG data sets collected from 10 subjects. The performance of SFM was assessed and compared with the ones achieved by the traditional k-nearest-neighbor classifier and support vector machines (SVMs). The results show that SFM achieved, on average, over 90% correct classification and outperformed other classification techniques. In the validation step, SFM correctly classified unseen preseizure and normal EEGs with over 73% accuracy.

show abstract

Optimizing feature selection to improve medical diagnosis

Fan

Chaovalitwongse

2009

Ann Oper Res

View full text Add to dashboard Cite

In this paper, we propose a new optimization framework for improving feature selection in medical data classification. We call this framework Support Feature Machine (SFM). The use of SFM in feature selection is to find the optimal group of features that show strong separability between two classes. The separability is measured in terms of inter-class and intra-class distances. The objective of SFM optimization model is to maximize the correctly classified data samples in the training set, whose intra-class distances are smaller than inter-class distances. This concept can be incorporated with the modified nearest neighbor rule for unbalanced data. In addition, a variation of SFM that provides the feature weights (prioritization) is also presented. The proposed SFM framework and its extensions were tested on 5 real medical datasets that are related to the diagnosis of epilepsy, breast cancer, heart disease, diabetes, and liver disorders. The classification performance of SFM is compared with those of support vector machine (SVM) classification and Logical Data Analysis (LAD), which is also an optimization-based feature selection technique. SFM gives very good classification results, yet uses far fewer features to make the decision than SVM and LAD. This result provides a very significant implication in diagnostic practice. The outcome of this study suggests that the SFM framework can be used as a quick decision-making tool in real clinical settings.

show abstract

Pattern- and Network-Based Classification Techniques for Multichannel Medical Data Signals to Improve Brain Diagnosis

Chaovalitwongse

Pottenger

Wang

et al. 2011

IEEE Trans. Syst., Man, Cybern. A

View full text Add to dashboard Cite

Mathematical Programming Formulations and Algorithms for Discrete k-Median Clustering of Time-Series Data

Şeref

Fan

Chaovalitwongse

2014

INFORMS Journal on Computing

View full text Add to dashboard Cite

Discrete k-median (DKM) clustering problems arise in many real-life applications that involve time-series data sets, in which nondiscrete clustering methods may not represent the problem domain adequately. In this study, we propose mathematical programming formulations and solution methods to efficiently solve the DKM clustering problem. We develop approximation algorithms from a bilinear formulation of the discrete k-median problem using an uncoupled bilinear program algorithm. This approximation algorithm, which we refer to as DKM-L, is composed of two alternating linear programs, where one can be solved in linear time and the other is a minimum cost assignment problem. We then modify this algorithm by replacing the assignment problem with an efficient sequential algorithm for a faster approximation, which we call DKM-S. We also propose a compact exact integer formulation, DKM-I, and a more efficient network design-based exact mixed-integer formulation, DKM-M. All of our methods use arbitrary pairwise distance matrices as input. We apply our methods to simulated single-variate and multivariate random walk time-series data. We report comparative clustering performances using normalized mutual information (NMI) and solution speeds among the DKM methods we propose. We also compare our methods to other clustering algorithms that can operate with distance matrices, such as hierarchical cluster trees (HCT) and partition around medoids (PAM). We present NMI scores and classification accuracies of our DKM algorithms compared to HCT and PAM using five different distance measures on simluated data, as well as public benchmark and real-life neural time-series data sets. We show that DKM-S is much faster than HCT, PAM, and all other DKM methods and produces consistently good clustering results on all data sets.

show abstract

Recent advances in mathematical programming for classification and cluster analysis

Fan¹,

İyigün²,

Chaovalitwongse³

2008

View full text Add to dashboard Cite

This chapter is focused on recent advances in mathematical programming methodologies in data mining research, which is a rapidly emerging interdisciplinary research area. The main focus of this review chapter lies on classification (supervised learning) and clustering (unsupervised learning), which are among the most studied data mining tasks. We give a thorough discussion on the mathematical modeling aspect of classification and clustering problems.

show abstract

Support feature machine for classification of abnormal brain activity

Chaovalitwongse

Fan

Sachdeo

2007

View full text Add to dashboard Cite

Information-theoretic feature selection with discrete $$k$$ k -median clustering

et al. 2014

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Ya-Ju Fan

On the Time Series $K$-Nearest Neighbor Classification of Abnormal Brain Activity

Novel Optimization Models for Abnormal Brain Activity Classification

Optimizing feature selection to improve medical diagnosis

Pattern- and Network-Based Classification Techniques for Multichannel Medical Data Signals to Improve Brain Diagnosis

Mathematical Programming Formulations and Algorithms for Discrete k-Median Clustering of Time-Series Data

Recent advances in mathematical programming for classification and cluster analysis

Support feature machine for classification of abnormal brain activity

Information-theoretic feature selection with discrete $$k$$ k -median clustering

Contact Info

Product

Resources

About