Multi-Class Classification Using a New Sigmoid Loss Function for Minimum Classification Error (MCE)

Ratnagiri, Madhavi Vedula; Rabiner, L. R.; Juang, Biing‐Hwang

doi:10.1109/icmla.2010.20

Cited by 6 publications

(4 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Many studies tried to solve SVM with 0-1 loss function by replacing it with other smooth functions such as sigmoid function [22,23] , logistic function [24] , polynomial function [23] , and hyperbolic tangent function [25] . These functions, while yielding accurate result, still suffer from being computationally-expensive when solved as an optimization problem due to its non-convex nature.…”

Section: Multiclass Classification With Ramp Lossmentioning

confidence: 99%

Robust multiclass classification for learning from imbalanced biomedical data

Phoungphol

Zhang

Zhao

2012

Tinshhua Sci. Technol.

View full text Add to dashboard Cite

Imbalanced data is a common and serious problem in many biomedical classification tasks. It causes a bias on the training of classifiers and results in lower accuracy of minority classes prediction. This problem has attracted a lot of research interests in the past decade. Unfortunately, most research efforts only concentrate on 2-class problems. In this paper, we study a new method of formulating a multiclass Support Vector Machine (SVM) problem for imbalanced biomedical data to improve the classification performance. The proposed method applies cost-sensitive approach and ramp loss function to the Crammer and Singer multiclass SVM formulation. Experimental results on multiple biomedical datasets show that the proposed solution can effectively cure the problem when the datasets are noisy and highly imbalanced.

show abstract

Section: Multiclass Classification With Ramp Lossmentioning

confidence: 99%

Robust multiclass classification for learning from imbalanced biomedical data

Phoungphol

Zhang

Zhao

2012

Tinshhua Sci. Technol.

View full text Add to dashboard Cite

show abstract

Section: Higher-order Potentialsmentioning

confidence: 99%

“…In our work, the higher-order potentials are directly modeled by the cluster-based features with a sigmoid function. The sigmoid function is usually used as the activation function in many classification methods [44][45][46], which can be seen in Figure 6. Before computing the higher-order energy of CRF defined in (9), the cluster-based features are normalized in [0,1] to balance the perception between features and classes.…”

Section: Higher-order Potentialsmentioning

confidence: 99%

Large-Scale ALS Data Semantic Classification Integrating Location-Context-Semantics Cues by Higher-Order CRF

Han

Wang

Huang

et al. 2020

Sensors

View full text Add to dashboard Cite

We designed a location-context-semantics-based conditional random field (LCS-CRF) framework for the semantic classification of airborne laser scanning (ALS) point clouds. For ALS datasets of high spatial resolution but with severe noise pollutions, more contexture and semantics cues, besides location information, can be exploited to surmount the decrease of discrimination of features for classification. This paper mainly focuses on the semantic classification of ALS data using mixed location-context-semantics cues, which are integrated into a higher-order CRF framework by modeling the probabilistic potentials. The location cues modeled by the unary potentials can provide basic information for discriminating the various classes. The pairwise potentials consider the spatial contextual information by establishing the neighboring interactions between points to favor spatial smoothing. The semantics cues are explicitly encoded in the higher-order potentials. The higher-order potential operates at the clusters level with similar geometric and radiometric properties, guaranteeing the classification accuracy based on semantic rules. To demonstrate the performance of our approach, two standard benchmark datasets were utilized. Experiments show that our method achieves superior classification results with an overall accuracy of 83.1% on the Vaihingen Dataset and an overall accuracy of 94.3% on the Graphics and Media Lab (GML) Dataset A compared with other classification algorithms in the literature. 2 of 28 from different equipment. Therefore, we incorporate location, spatial contextual, and semantics cues within a higher-order conditional random field (CRF) framework to provide complementary information from varying perspectives, so that it can address the common misjudgment of semantic classes in ALS point clouds, from the perspectives of the accuracy of each class and the overall accuracy. Related Works for ALS Point Cloud ClassificationAccording to the type of entity used for classification, existing methods can be categorized as point-based and cluster-based (or segment-based) [2,3]. Point-based methods classify each point of the ALS data by using features as the inputs for supervised or unsupervised classifiers [4], while cluster-based methods segment the ALS data into clusters, then class labels are assigned to the clusters in which all points share the same class label [5,6]. We briefly review the aforementioned methods, and demonstrate the rationale for our method in what follows.Point-based methods generally extract point-wise features locally from the neighborhood defined by a sphere or cylinder. Therefore, such methods usually focus on the selection of discriminative features and effective classifiers. For instance, Reference [7] worked on 3D scene analysis, including geometric features extraction and optimal neighbors selection. Then an optimal eigenentropy-based scale selection method was proposed. Reference [2] combined airborne LiDAR with images to extract more discriminative features. Then, based on these ...

show abstract

“…This discriminant function is a smoothed approximation of the score difference between reference and decoding hypotheses and is usually used as a criterion for an objective function [17,42]. The relation between the true classification risk and the smoothed MCE loss function is recently studied in [43,44]. These studies revealed that the direct minimization of the classification error can be achieved if we could minimize the MCE loss function.…”

Section: I M C E -B a S E D D I S C R I M I N A T I V E T R A I Nmentioning

confidence: 99%

Joint optimization on decoding graphs using minimum classification error criterion

Abdelhamid

Abdulla

2014

SIP

View full text Add to dashboard Cite

Motivated by the inherent correlation between the speech features and their lexical words, we propose in this paper a new framework for learning the parameters of the corresponding acoustic and language models jointly. The proposed framework is based on discriminative training of the models' parameters using minimum classification error criterion. To verify the effective- I . I N T R O D U C T I O NVarious approaches of statistical learning in the field of machine learning have been extensively studied [1,2]. In machine learning, there are two main categories of learning algorithms for building pattern classifiers, namely generative and discriminative training algorithms. In generative training, the probability distribution of data points in each class is estimated using density estimation methods. The parametric modeling approach [3] is usually adopted to make the density estimation problem more feasible. The parametric modeling is based on an assumption that unknown probability distributions belong to some families of computationally tractable functions, such as the family of exponential distributions [4]. The unknown parameters of the presumed distribution are then estimated from training data using the common maximum-likelihood estimation (MLE) approach. The estimated distributions are then used for pattern classification, such as speech decoding based on the maximum a posteriori (MAP) decision rule. There are many efficient algorithms for training the generative models, such as expectation maximization (EM) [5,6] and Baum-Welch [7] algorithms. The advantage of generative training approach is the ability to exploit the inherent dependency among the training data samples. However, the limitation of generative training is the assumption of the distribution of the data set, which is not the actual distribution and accordingly results in a suboptimal performance of the generated classifier.On the other hand, discriminative training approach does not explicitly attempt to model the data distribution, but instead, it directly optimizes a mapping function from input samples to output labels [8,9]. Therefore, the discriminative training approach aims to adjust only the decision boundary without constructing a data generator from the entire feature space. In the literature, considerable research effort is applied to discriminative training for improving the speech recognition performance [10][11][12][13][14][15][16]. The most popular discriminative training criteria include: minimum classification error (MCE) [17][18][19], maximum mutual information (MMI) [20][21][22], minimum error rate training (MERT) [23], minimum phone/word error (MPE/MWE) [24], and minimum Bayes risk (MBR) [25][26][27][28][29]. While the MMI method uses mutual information as the criterion for maximization, all other methods attempt to reduce the empirical error by optimizing error rate related objective functions. Although discriminative training is advantageous over generative training in terms of the avoidance of the data distribution assumpti...

show abstract

Multi-Class Classification Using a New Sigmoid Loss Function for Minimum Classification Error (MCE)

Cited by 6 publications

References 5 publications

Robust multiclass classification for learning from imbalanced biomedical data

Robust multiclass classification for learning from imbalanced biomedical data

Large-Scale ALS Data Semantic Classification Integrating Location-Context-Semantics Cues by Higher-Order CRF

Joint optimization on decoding graphs using minimum classification error criterion

Contact Info

Product

Resources

About