A global optimization technique for statistical classifier design

Miller, David J.; Rao, A.V.; Rose, Kenneth; Gersho, A.

doi:10.1109/78.553484

Cited by 74 publications

(60 citation statements)

References 51 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Though previous research proposing solutions for this problem exists (e.g., [31], [32]), it deals with the case of small number of classes and is not directly applicable to our case. The key idea of learning algorithms minimizing classification error is to replace the discrete misclassification cost function with some smooth approximation in order to be able to take a derivative of the cost function and perform gradient descent optimization.…”

Section: A Description Of Used Algorithmsmentioning

confidence: 99%

Use of Identification Trial Statistics for the Combination of Biometric Matchers

Tulyakov

Govindaraju

2008

IEEE Trans.Inform.Forensic Secur.

View full text Add to dashboard Cite

Abstract-Combination functions typically used in biometric identification systems consider as input parameters only those matching scores which are related to a single person in order to derive a combined score for that person. We discuss how such methods can be extended to utilize the matching scores corresponding to all people. The proposed combination methods account for dependencies between scores output by any single participating matcher. Our experiments demonstrate the advantage of using such combination methods when dealing with a large number of classes, as is the case with biometric person identification systems. The experiments are performed on the National Institute of Standards and Technology BSSR1 dataset and the combination methods considered include the likelihood ratio, neural network, and weighted sum.

show abstract

Section: A Description Of Used Algorithmsmentioning

confidence: 99%

Use of Identification Trial Statistics for the Combination of Biometric Matchers

Tulyakov

Govindaraju

2008

IEEE Trans.Inform.Forensic Secur.

View full text Add to dashboard Cite

show abstract

“…In general, higher correlations assist the bit-mapper as it uses all the received bits to correct errors, unlike the grouping approach which is forced to use only the bits within each group. From 4(b), it also follows that the overhead required to store the Bayesian network aggravates at higher N and the performance degrades further, making the Bayesian network approach impractical for very large networks 7 . Also, for these datasets, observe that the performance of the greedy-iterative descent method is considerably poorer than that using DA.…”

Section: Complexity-distortion Trade-offmentioning

confidence: 99%

“…Further, we impose the 'nearest prototype' structural constraint on the bit-mapper partitions by appropriately choosing a parametrization of the association probabilities. Similar methods have been used before in the context of design of tree-structured quantizers [13], generalized VQ design [11] and optimal classifier design [7]). It can be shown using the principle of entropy maximization that (refer to [13]), to impose a 'nearest prototype' structure, at each temperature, the association probabilities must be governed by the Gibbs distribution:…”

Section: Deterministic Annealing Based Designmentioning

confidence: 99%

“…As the total storage is not reflected in these plots, we do not consider the performance of the Bayesian network approach hereafter, noting that, the storage required to achieve good distortion performance is significantly higher. 7 For the rainfall dataset, the storage required for the Bayesian network approach was significantly larger and hence we do not plot it along with the other curves…”

Section: Complexity-distortion Trade-offmentioning

confidence: 99%

See 1 more Smart Citation

Error-resilient and complexity-constrained distributed coding for large scale sensor networks

Viswanatha¹,

Ramaswamy²,

Saxena³

et al. 2012

2012 ACM/IEEE 11th International Conference on Information Processing in Sensor Networks (IPSN)

View full text Add to dashboard Cite

There has been considerable interest in distributed source coding within the compression and sensor network research communities in recent years, primarily due to its potential contributions to low-power sensor networks. However, two major obstacles pose an existential threat on practical deployment of such techniques in real world sensor networks, namely, the exponential growth of decoding complexity with network size and coding rates, and the critical requirement for error-resilience given the severe channel conditions in many wireless sensor networks. Motivated by these challenges, this paper proposes a novel, unified approach for large scale, error-resilient distributed source coding, based on an optimally designed classifier-based decoding framework, where the design explicitly controls the decoding complexity. We also present a deterministic annealing (DA) based global optimization algorithm for the design due to the highly non-convex nature of the cost function, which further enhances the performance over basic greedy iterative descent technique. Simulation results on data, both synthetic and from real sensor networks, provide strong evidence that the approach opens the door to practical deployment of distributed coding in large sensor networks. It not only yields substantial gains in terms of overall distortion, compared to other state-of-the-art techniques, but also demonstrates how its decoder naturally scales to large networks while constraining the complexity, thereby enabling performance gains that increase with network size.

show abstract

“…The deterministic annealing (DA) technique has demonstrated substantial performance improvement over clustering, classification and constrained optimization problems [1,2,3,4,5]. Since DA is strongly motivated by the analogies to statistical physics [6], it regards the optimization problem in question as a thermal system.…”

Section: Introductionmentioning

confidence: 99%

Annealed Discriminant Analysis

Wang

Zhang

Lochovsky

2005

Machine Learning: ECML 2005

View full text Add to dashboard Cite

Abstract. Motivated by the analogies to statistical physics, the deterministic annealing (DA) method has successfully been demonstrated in a variety of applications. In this paper, we explore a new methodology to devise the classifier under the DA method. The differential cost function is derived subject to a constraint on the randomness of the solution, which is governed by the temperature T . While gradually lowering the temperature, we can always find a good solution which can both solve the overfitting problem and avoid poor local optima. Our approach is called annealed discriminant analysis (ADA). It is a general approach, where we elaborate two classifiers, i.e., distance-based and inner product-based, in this paper. The distance-based classifier is an annealed version of linear discriminant analysis (LDA) while the inner product-based classifier is a generalization of penalized logistic regression (PLR). As such, ADA provides new insights into the workings of these two classification algorithms. The experimental results show substantial performance gains over standard learning methods.

show abstract

A global optimization technique for statistical classifier design

Cited by 74 publications

References 51 publications

Use of Identification Trial Statistics for the Combination of Biometric Matchers

Use of Identification Trial Statistics for the Combination of Biometric Matchers

Error-resilient and complexity-constrained distributed coding for large scale sensor networks

Annealed Discriminant Analysis

Contact Info

Product

Resources

About