2020
DOI: 10.1109/tit.2019.2958705
|View full text |Cite
|
Sign up to set email alerts
|

Bregman Divergence Bounds and Universality Properties of the Logarithmic Loss

Abstract: A loss function measures the discrepancy between the true values and their estimated fits, for a given instance of data. In classification problems, a loss function is said to be proper if the minimizer of the expected loss is the true underlying probability. In this work we show that for binary classification, the divergence associated with smooth, proper and convex loss functions is bounded from above by the Kullback-Leibler (KL) divergence, up to a normalization constant. It implies that by minimizing the l… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
11
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
7
2
1

Relationship

2
8

Authors

Journals

citations
Cited by 19 publications
(11 citation statements)
references
References 64 publications
0
11
0
Order By: Relevance
“…Finally, CRCCA may be generalized to a broader framework, in which we replace the correlation objective with mutual information maximization of the mapped signals, . This problem strives to capture more fundamental dependencies between X and Y , as the mutual information is a statistic of the entire joint probability distribution, which holds many desirable characteristics (as shown, for example, in [ 55 , 56 ]). This generalized framework may also be viewed as a two-way information bottleneck problem, as previously shown in [ 57 ].…”
Section: Discussion and Conclusionmentioning
confidence: 99%
“…Finally, CRCCA may be generalized to a broader framework, in which we replace the correlation objective with mutual information maximization of the mapped signals, . This problem strives to capture more fundamental dependencies between X and Y , as the mutual information is a statistic of the entire joint probability distribution, which holds many desirable characteristics (as shown, for example, in [ 55 , 56 ]). This generalized framework may also be viewed as a two-way information bottleneck problem, as previously shown in [ 57 ].…”
Section: Discussion and Conclusionmentioning
confidence: 99%
“…As we can see from the example, d Ψ(z) (0.5, p) of the SCE loss has a larger distance than that of the NS loss. In fact, Painsky and Wornell (2020) proved that the upper bound of the Bregman divergence for binary labels when…”
Section: Divergencesmentioning
confidence: 99%
“…The KL divergence is a widely used measure for the discrepancy between two probability distributions, with many desirable properties [ 20 ]. In addition, the KL divergence serves as an upper bound for a collection of popular discrepancy measures (for example, the Pinsker inequality [ 21 ] and the universality results in [ 22 , 23 ]). In this sense, by minimizing the KL divergence, we implicity bound from above a large set of common performance merits.…”
Section: The Suggested Inference Schemementioning
confidence: 99%