Philip K. Chan scite author profile

Dynamic Time Warping (DTW) has a quadratic time and space complexity that limits its use to small time series. In this paper we introduce FastDTW, an approximation of DTW that has a linear time and space complexity. FastDTW uses a multilevel approach that recursively projects a solution from a coarser resolution and refines the projected solution. We prove the linear time and space complexity of FastDTW both theoretically and empirically. We also analyze the accuracy of FastDTW by comparing it to two other types of existing approximate DTW algorithms: constraints (such as Sakoe-Chiba Bands) and abstraction. Our results show a large improvement in accuracy over existing methods.

show abstract

Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms

Salvador

Chan

446

373

View full text Add to dashboard Cite

Many clustering and segmentation algorithms both suffer from the limitation that the number of clusters/segments are specified by a human user. It is often impractical to expect a human with sufficient domain knowledge to be available to select the number of clusters/segments to return. In this paper, we investigate techniques to determine the number of clusters or segments to return from hierarchical clustering and segmentation algorithms. We propose an efficient algorithm, the L method, that finds the "knee" in a '# of clusters vs. clustering evaluation metric' graph. Using the knee is well-known, but is not a particularly well-understood method to determine the number of clusters. We explore the feasibility of this method, and attempt to determine in which situations it will and will not work. We also compare the L method to existing methods based on the accuracy of the number of clusters that are determined and efficiency. Our results show favorable performance for these criteria compared to the existing methods that were evaluated.

show abstract

Cost-based modeling for fraud and intrusion detection: results from the JAM project

et al.

View full text Add to dashboard Cite

In this paper we describe the results achieved using the JAM distributed data mining system for the real world problem of fraud detection in financial information systems. For this domain we provide clear evidence that state-of-the-art commercial fraud detection systems can be substantially improved in stopping losses due to fraud by combining multiple models of fraudulent transaction shared among banks. We demonstrate that the traditional statistical metrics used to train and evaluate the performance of learning systems, (i.e. statistical accuracy or ROC analysis) are misleading and perhaps inappropriate for this application. Cost-based metrics are more relevant in certain domains, and defining such metrics poses significant and interesting research questions both in evaluating systems and alternative models, and in formalizing the problems to which one may wish to apply data mining technologies. This paper also demonstrates how the techniques developed for fraud detection can be generalized and applied to the important area of Intrusion Detection in networked information systems. We report the outcome of recent evaluations of our system applied to tcpdump network intrusion data specifically with respect to statistical accuracy. This work involved building additional components of JAM that we have come to call, MADAM ID (Mining Audit Data for Automated Models for Intrusion Detection). However, taking the next step to define cost-based models for intrusion detection poses interesting new research questions. We describe our initial ideas about how to evaluate intrusion detection systems using cost models learned during our work on fraud detection.

show abstract

<title>Max-mean and max-median filters for detection of small targets</title>

et al. 1999

View full text Add to dashboard Cite

An Analysis of the 1999 DARPA/Lincoln Laboratory Evaluation Data for Network Anomaly Detection

2003

View full text Add to dashboard Cite

Abstract. The DARPA/MIT Lincoln Laboratory off-line intrusion detection evaluation data set is the most widely used public benchmark for testing intrusion detection systems. Our investigation of the 1999 background network traffic suggests the presence of simulation artifacts that would lead to overoptimistic evaluation of network anomaly detection systems. The effect can be mitigated without knowledge of specific artifacts by mixing real traffic into the simulation, although the method requires that both the system and the real traffic be analyzed and possibly modified to ensure that the system does not model the simulated traffic independently of the real traffic.

show abstract

Distributed data mining in credit card fraud detection

Chan

Fan

Prodromidis³

et al. 1999

IEEE Intell. Syst.

398

152

View full text Add to dashboard Cite

Learning nonstationary models of normal network traffic for detecting novel attacks

2002

View full text Add to dashboard Cite

Medical comorbidity in bipolar disorder: relationship between illnesses of the endocrine/metabolic system and treatment outcome

et al. 2010

View full text Add to dashboard Cite

Objective-The present study examined the relationship between medical burden in bipolar disorder and several indicators of illness severity and outcome. It was hypothesized that illnesses of the endocrine/metabolic system would be associated with greater psychiatric symptom burden and would impact the response to treatment with lithium and valproate.Method-Data were analyzed from two studies evaluating lithium and valproate for rapidcycling presentations of bipolar I and II disorder. General medical comorbidity was assessed by the Cumulative Illness Rating Scale (CIRS). Descriptive statistics and logistic regression analyses were conducted to explore the relationships between medical burden, body mass index (BMI), substance use disorder status, and depressive symptom severity.Results-Of 225 patients enrolled, 41.8% had a recent substance use disorder, 50.7% were male, and 69.8% had bipolar I disorder. The mean age of the sample was 36.8 (SD = 10.8) years old. The mean number of comorbid medical disorders per patient was 2.5 (SD = 2.5), and the mean CIRS total score was 4.3 (SD = 3.1). A significant positive correlation was observed between baseline depression severity and the number of organ systems affected by medical illness (p = 0.04). Illnesses of the endocrine/metabolic system were inversely correlated with remission from depressive symptoms (p = 0.02), and obesity was specifically associated with poorer treatment outcome. For every 1-unit increase in BMI, the likelihood of response decreased by 7.5% [odds ratio (OR) = 0.93, 95% confidence interval (CI): 0.87-0.99; p = 0.02] and the likelihood of remission decreased by 7.3% (OR = 0.93, 95% CI: 0.87-0.99; p = 0.03). The effect of comorbid substance use on the likelihood of response differed significantly according to baseline BMI. The presence of a comorbid substance use disorder resulted in a lower odds of response, but only among patients with a BMI ≥ 23 (p = 0.02).Corresponding author: David E. Kemp, M.D., Case Western Reserve University, 10524 Euclid Avenue, 12 th Floor, Cleveland, OH 44106, USA, Fax: 216-844-2875, kemp.david@gmail.com. Disclosures: DEK has acted as a consultant to Bristol-Myers Squibb and has served on a speakers bureau for AstraZeneca and Pfizer. KG has received grant support and/or honoraria from Abbott, AstraZeneca, and GlaxoSmithKline; has served as a consultant to Schering Plough; and has served on a speakers bureau for Pfizer. SJG has received grant support from AstraZeneca and Eli Lilly & Co. RLF receives or has received research support, acted as a consultant, and/or served on a speakers bureau for Abbott, Addrenex, AstraZeneca, Bristol-Myers Squibb, Forest, GlaxoSmithKline, Johnson & Johnson, Eli Lilly & Co., Neuropharm, Novartis, Organon, Otsuka, Pfizer, Sanofi-aventis, Sepracore, Shire, Solvay, Supernus Pharmaceuticals, Validus, and Wyeth. JRC has received research support, acted as a consultant, and/or served on an advisory board for Abbott, AstraZeneca, Bristol-Myers Squibb, France Foundation, GlaxoSmithKline, Janss...

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Philip K. Chan

Toward accurate dynamic time warping in linear time and space

Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms

Cost-based modeling for fraud and intrusion detection: results from the JAM project

<title>Max-mean and max-median filters for detection of small targets</title>

An Analysis of the 1999 DARPA/Lincoln Laboratory Evaluation Data for Network Anomaly Detection

Distributed data mining in credit card fraud detection

Learning nonstationary models of normal network traffic for detecting novel attacks

Medical comorbidity in bipolar disorder: relationship between illnesses of the endocrine/metabolic system and treatment outcome

Contact Info

Product

Resources

About