Support Vector Machines, Data Reduction, and Approximate Kernel Matrices

Nguyen, Xuan Long; Huang, Ling; Joseph, Anthony D.

doi:10.1007/978-3-540-87481-2_10

Cited by 17 publications

(8 citation statements)

References 13 publications

(15 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In fact a hyperplane only requires the determination of two parameters, that is, a weight vector (that determines its slope) and a bias parameter. Depending on the number of target classes, SVM based techniques can be classified as (Tax and Duin 1999;Aly 2005;Bishop 2006;Nguyen et al 2008;Yeung et al 2007;Steinwart and Christmann 2008;Tax and Duin 2004;Scholkopf and Smola 2001;Herbrich 2002;Weston and Watkins 1998;Smola and Schölkopf 1998;Weston and Watkins 1999;Mayoraz and Alpaydin 1999;Bredensteiner and Bennett 1999;Schwenker 2000;Schölkopf et al 2001;Hsu and Lin 2002;Franc and Hlavác 2002;Elisseeff and Weston 2002;Zhu et al 2003 Outlier and event detection in WSNs require a model of normal data. All data samples which do not fit the normal data model are declared to be outliers.…”

Section: Bayesian Based Approachesmentioning

confidence: 98%

“…Labeled input-output pairs may be fed to the algorithms during training phase (Stankovic et al 2012;Ganguly 2008;Bishop 2006). The performance of a classifier depends on its ability to classify the unseen data based on the learned model and is more generally known as its generalization ability (Nguyen et al 2008;Yeung et al 2007;Steinwart and Christmann 2008;Tax and Duin 2004). Depending on the type of model learned during the training phase, these techniques can be divided into two types:…”

Section: Classification Based Outlier and Event Detection For Wsns Dementioning

confidence: 99%

See 1 more Smart Citation

One-class support vector machines: analysis of outlier detection for wireless sensor networks in harsh environments

Shahid

Naqvi

Qaisar³

2013

Artif Intell Rev

View full text Add to dashboard Cite

Machine learning, like its various applications, has received a great interest in outlier detection in Wireless Sensor Networks. Support Vector Machines (SVM) are a special type of Machine learning techniques which are computationally inexpensive and provide a sparse solution. This work presents a detailed analysis of various formulations of one-class SVMs, like, hyper-plane, hyper-sphere, quarter-sphere and hyper-ellipsoidal. These formulations are used to separate the normal data from anomalous data. Various techniques based on these formulations have been analyzed in terms of a number of characteristics for harsh environments. These characteristics include input data type, spatio-temporal and attribute correlations, user specified thresholds, outlier types, outlier identification(event/error), outlier degree, susceptibility to dynamic topology, non-stationarity and inhomogeneity. A tabular description of improvement and feasibility of various techniques for deployment in the harsh environments has also been presented.

show abstract

Section: Bayesian Based Approachesmentioning

confidence: 98%

Section: Classification Based Outlier and Event Detection For Wsns Dementioning

confidence: 99%

One-class support vector machines: analysis of outlier detection for wireless sensor networks in harsh environments

Shahid

Naqvi

Qaisar³

2013

Artif Intell Rev

View full text Add to dashboard Cite

show abstract

“…Reduction Sampling [9][10][11][12][13][14][15][16][17][18] Kernel Matrix [24][25][26][27][28][29] Optimization [5][6][7][8] Fast Inference [2,[30][31][32][33] post-processing pre-processing efficient if the number of support vectors is low. Also note that under mild assumptions, SVDD is equivalent to ν-SVM [20].…”

Section: Fast Trainingmentioning

confidence: 99%

“…This is the category of methods mentioned in our introduction [9][10][11][12][13][14][15][16][17][18]. A second type reduces the size of the Kernel matrix, e.g., by approximation [24][25][26][27]. Examples are the Nystrm-method [28] and choosing random Fourier features [29].…”

Section: A Categorizationmentioning

confidence: 99%

Efficient SVDD Sampling with Approximation Guarantees for the Decision Boundary

Englhardt¹,

Trittenbach²,

Kottke³

et al. 2020

Preprint

View full text Add to dashboard Cite

Support Vector Data Description (SVDD) is a popular one-class classifiers for anomaly and novelty detection. But despite its effectiveness, SVDD does not scale well with data size. To avoid prohibitive training times, sampling methods select small subsets of the training data on which SVDD trains a decision boundary hopefully equivalent to the one obtained on the full data set. According to the literature, a good sample should therefore contain so-called boundary observations that SVDD would select as support vectors on the full data set. However, non-boundary observations also are essential to not fragment contiguous inlier regions and avoid poor classification accuracy. Other aspects, such as selecting a sufficiently representative sample, are important as well. But existing sampling methods largely overlook them, resulting in poor classification accuracy.In this article, we study how to select a sample considering these points. Our approach is to frame SVDD sampling as an optimization problem, where constraints guarantee that sampling indeed approximates the original decision boundary. We then propose RAPID, an efficient algorithm to solve this optimization problem. RAPID does not require any tuning of parameters, is easy to implement and scales well to large data sets. We evaluate our approach on real-world and synthetic data. Our evaluation is the most comprehensive one for SVDD sampling so far. Our results show that RAPID outperforms its competitors in classification accuracy, in sample size, and in runtime.

show abstract

“…In such cases, they can log only aggregated data or an approximation of aggregated data and still get a good estimate of the required statistics. Approximation provides statistically sound estimates of metrics that are useful to machine-learning analyses such as PCA (principal component analysis) and SVM (support vector machine 14 ). These techniques are critical in networked or large-scale distributed systems, where collecting even a single number from each component carries a heavy performance cost.…”

mentioning

confidence: 99%

Advances and Challenges in Log Analysis

Oliner

Ganapathi

Wei

2011

Queue

View full text Add to dashboard Cite

Computer-system logs provide a glimpse into the states of a running system. Instrumentation occasionally generates short messages that are collected in a system-specific log. The content and format of logs can vary widely from one system to another and even among components within a system. A printer driver might generate messages indicating that it had trouble communicating with the printer, while a Web server might record which pages were requested and when.As the content of the logs is varied, so are their uses. The printer log might be used for troubleshooting, while the Web-server log is used to study traffic patterns to maximize advertising revenue. Indeed, a single log may be used for multiple purposes: information about the traffic along different network paths, called flows, might help a user optimize network performance or detect a malicious intrusion; or call-detail records can monitor who called whom and when, and upon further analysis can reveal call volume and drop rates within entire cities.This article provides an overview of some of the most common applications of log analysis, describes some of the logs that might be analyzed and the methods of analyzing them, and elucidates some of the lingering challenges. Log analysis is a rich field of research; while it is not our goal to provide a literature survey, we do intend to provide a clear understanding of why log analysis is both vital and difficult. DEBUGGINGMany logs are intended to facilitate debugging. As Brian Kernighan wrote in Unix for Beginners in 1979, "The most effective debugging tool is still careful thought, coupled with judiciously placed print statements." Although today's programs are orders of magnitude more complex than those of 30 years ago, many people still log using printf to console or local disk and use some combination of manual inspection and regular expressions to locate specific messages or patterns.The simplest and most common use for a debug log is to grep for a specific message. If a server operator believes that a program crashed because of a network failure, then he or she might try to find a "connection dropped" message in the server logs. In many cases, it is difficult to figure out what to search for, as there is no well-defined mapping between log messages and observed symptoms. When a Web service suddenly becomes slow, the operator is unlikely to see an obvious error message saying, "ERROR: The service latency increased by 10% because bug X, on line Y, was triggered." Instead, users often perform a search for severity keywords such as "error" or "failure." Such severity levels are often used inaccurately, however, because a developer rarely has complete knowledge of how the code will ultimately be used.Furthermore, red-herring messages (e.g., "no error detected") may pollute the result set with

show abstract

Support Vector Machines, Data Reduction, and Approximate Kernel Matrices

Cited by 17 publications

References 13 publications

One-class support vector machines: analysis of outlier detection for wireless sensor networks in harsh environments

One-class support vector machines: analysis of outlier detection for wireless sensor networks in harsh environments

Efficient SVDD Sampling with Approximation Guarantees for the Decision Boundary

Advances and Challenges in Log Analysis

Contact Info

Product

Resources

About