Kai Ming Ting scite author profile

Most existing model-based approaches to anomaly detection construct a profile of normal instances, then identify instances that do not conform to the normal profile as anomalies. This paper proposes a fundamentally different model-based method that explicitly isolates anomalies instead of profiles normal points. To our best knowledge, the concept of isolation has not been explored in current literature. The use of isolation enables the proposed method, iForest, to exploit sub-sampling to an extent that is not feasible in existing methods, creating an algorithm which has a linear time complexity with a low constant and a low memory requirement. Our empirical evaluation shows that iForest performs favourably to ORCA, a near-linear time complexity distance-based method, LOF and Random Forests in terms of AUC and processing time, and especially in large data sets. iForest also works well in high dimensional problems which have a large number of irrelevant attributes, and in situations where training set does not contain any anomalies.

show abstract

Isolation-Based Anomaly Detection

Liu

Ting

Zhou

2012

ACM Trans. Knowl. Discov. Data

1,225

661

View full text Add to dashboard Cite

Anomalies are data points that are few and different. As a result of these properties, we show that, anomalies are susceptible to a mechanism called isolation . This article proposes a method called Isolation Forest ( i Forest), which detects anomalies purely based on the concept of isolation without employing any distance or density measure---fundamentally different from all existing methods. As a result, i Forest is able to exploit subsampling (i) to achieve a low linear time-complexity and a small memory-requirement and (ii) to deal with the effects of swamping and masking effectively. Our empirical evaluation shows that i Forest outperforms ORCA, one-class SVM, LOF and Random Forests in terms of AUC, processing time, and it is robust against masking and swamping effects. i Forest also works well in high dimensional problems containing a large number of irrelevant attributes, and when anomalies are not available in training sample.

show abstract

Issues in Stacked Generalization

Ting¹,

Witten²

1999

jair

544

348

View full text Add to dashboard Cite

Stacked generalization is a general method of using a high-level model to combine lowerlevel models to achieve greater predictive accuracy. In this paper we address two crucial issues which h a v e been considered to be a`black art' in classi cation tasks ever since the introduction of stacked generalization in 1992 by W olpert: the type of generalizer that is suitable to derive the higher-level model, and the kind of attributes that should be used as its input. We nd that best results are obtained when the higher-level model combines the con dence and not just the predictions of the lower-level ones.We demonstrate the e ectiveness of stacked generalization for combining three di erent types of learning algorithms for classi cation tasks. We also compare the performance of stacked generalization with majority v ote and published results of arcing and bagging.

show abstract

A Survey of Audio-Based Music Classification and Annotation

Ting

et al. 2011

IEEE Trans. Multimedia

342

191

View full text Add to dashboard Cite

An Empirical Study of MetaCost Using Boosting Algorithms

Ting

2000

117

188

View full text Add to dashboard Cite

MetaCost is a recently proposed procedure that converts an error-based learning algorithm into a cost-sensitive algorithm. This paper investigates two important issues centered on the procedure which were ignored in the paper proposing MetaCost. First, no comparison was made between MetaCost's final model and the internal cost-sensitive classifier on which MetaCost depends. It is credible that the internal cost-sensitive classifier may outperform the final model without the additional computation required to derive the final model. Second, MetaCost assumes its internal cost-sensitive classifier is obtained by applying a minimum expected cost criterion. It is unclear whether violation of the assumption has an impact on MetaCost's performance. We study these issues using two boosting procedures, and compare with the performance of the original form of MetaCost which employs bagging.

show abstract

Confusion Matrix

Ting¹

2017

211

117

View full text Add to dashboard Cite

Density-ratio based clustering for discovering clusters with varying densities

Zhu

Ting

Carman

2016

Pattern Recognition

115

View full text Add to dashboard Cite

Nearest-Neighbour-Induced Isolation Similarity and Its Impact on Density-Based Clustering

Qin

Ting

Zhu

et al. 2019

AAAI

View full text Add to dashboard Cite

A recent proposal of data dependent similarity called Isolation Kernel/Similarity has enabled SVM to produce better classification accuracy. We identify shortcomings of using a tree method to implement Isolation Similarity; and propose a nearest neighbour method instead. We formally prove the characteristic of Isolation Similarity with the use of the proposed method. The impact of Isolation Similarity on densitybased clustering is studied here. We show for the first time that the clustering performance of the classic density-based clustering algorithm DBSCAN can be significantly uplifted to surpass that of the recent density-peak clustering algorithm DP. This is achieved by simply replacing the distance measure with the proposed nearest-neighbour-induced Isolation Similarity in DBSCAN, leaving the rest of the procedure unchanged. A new type of clusters called mass-connected clusters is formally defined. We show that DBSCAN, which detects density-connected clusters, becomes one which detects mass-connected clusters, when the distance measure is replaced with the proposed similarity. We also provide the condition under which mass-connected clusters can be detected, while density-connected clusters cannot.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Kai Ming Ting

Isolation Forest

Isolation-Based Anomaly Detection

Issues in Stacked Generalization

A Survey of Audio-Based Music Classification and Annotation

An Empirical Study of MetaCost Using Boosting Algorithms

Confusion Matrix

Density-ratio based clustering for discovering clusters with varying densities

Nearest-Neighbour-Induced Isolation Similarity and Its Impact on Density-Based Clustering

Contact Info

Product

Resources

About