Bregman Divergence Bounds and Universality Properties of the Logarithmic Loss

Painsky, Amichai; Wornell, Gregory W.

doi:10.1109/tit.2019.2958705

Cited by 19 publications

(11 citation statements)

References 64 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Finally, CRCCA may be generalized to a broader framework, in which we replace the correlation objective with mutual information maximization of the mapped signals,

. This problem strives to capture more fundamental dependencies between X and Y , as the mutual information is a statistic of the entire joint probability distribution, which holds many desirable characteristics (as shown, for example, in [ 55 , 56 ]). This generalized framework may also be viewed as a two-way information bottleneck problem, as previously shown in [ 57 ].…”

Section: Discussion and Conclusionmentioning

confidence: 99%

Nonlinear Canonical Correlation Analysis:A Compressed Representation Approach

Painsky

Feder

Tishby

2020

Entropy

Self Cite

View full text Add to dashboard Cite

Canonical Correlation Analysis (CCA) is a linear representation learning method that seeks maximally correlated variables in multi-view data. Nonlinear CCA extends this notion to a broader family of transformations, which are more powerful in many real-world applications. Given the joint probability, the Alternating Conditional Expectation (ACE) algorithm provides an optimal solution to the nonlinear CCA problem. However, it suffers from limited performance and an increasing computational burden when only a finite number of samples is available. In this work, we introduce an information-theoretic compressed representation framework for the nonlinear CCA problem (CRCCA), which extends the classical ACE approach. Our suggested framework seeks compact representations of the data that allow a maximal level of correlation. This way, we control the trade-off between the flexibility and the complexity of the model. CRCCA provides theoretical bounds and optimality conditions, as we establish fundamental connections to rate-distortion theory, the information bottleneck and remote source coding. In addition, it allows a soft dimensionality reduction, as the compression level is determined by the mutual information between the original noisy data and the extracted signals. Finally, we introduce a simple implementation of the CRCCA framework, based on lattice quantization.

show abstract

“…Finally, CRCCA may be generalized to a broader framework, in which we replace the correlation objective with mutual information maximization of the mapped signals,

Section: Discussion and Conclusionmentioning

confidence: 99%

Nonlinear Canonical Correlation Analysis:A Compressed Representation Approach

Painsky

Feder

Tishby

2020

Entropy

Self Cite

View full text Add to dashboard Cite

show abstract

“…As we can see from the example, d Ψ(z) (0.5, p) of the SCE loss has a larger distance than that of the NS loss. In fact, Painsky and Wornell (2020) proved that the upper bound of the Bregman divergence for binary labels when…”

Section: Divergencesmentioning

confidence: 99%

Unified Interpretation of Softmax Cross-Entropy and Negative Sampling: With Case Study for Knowledge Graph Embedding

Kamigaito¹,

Hayashi²

2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

View full text Add to dashboard Cite

In knowledge graph embedding, the theoretical relationship between the softmax crossentropy and negative sampling loss functions has not been investigated. This makes it difficult to fairly compare the results of the two different loss functions. We attempted to solve this problem by using the Bregman divergence to provide a unified interpretation of the softmax cross-entropy and negative sampling loss functions. Under this interpretation, we can derive theoretical findings for fair comparison. Experimental results on the FB15k-237 and WN18RR datasets show that the theoretical findings are valid in practical settings.

show abstract

“…The KL divergence is a widely used measure for the discrepancy between two probability distributions, with many desirable properties [ 20 ]. In addition, the KL divergence serves as an upper bound for a collection of popular discrepancy measures (for example, the Pinsker inequality [ 21 ] and the universality results in [ 22 , 23 ]). In this sense, by minimizing the KL divergence, we implicity bound from above a large set of common performance merits.…”

Section: The Suggested Inference Schemementioning

confidence: 99%

Robust Universal Inference

Painsky

Feder

2021

Entropy

Self Cite

View full text Add to dashboard Cite

Learning and making inference from a finite set of samples are among the fundamental problems in science. In most popular applications, the paradigmatic approach is to seek a model that best explains the data. This approach has many desirable properties when the number of samples is large. However, in many practical setups, data acquisition is costly and only a limited number of samples is available. In this work, we study an alternative approach for this challenging setup. Our framework suggests that the role of the train-set is not to provide a single estimated model, which may be inaccurate due to the limited number of samples. Instead, we define a class of “reasonable” models. Then, the worst-case performance in the class is controlled by a minimax estimator with respect to it. Further, we introduce a robust estimation scheme that provides minimax guarantees, also for the case where the true model is not a member of the model class. Our results draw important connections to universal prediction, the redundancy-capacity theorem, and channel capacity theory. We demonstrate our suggested scheme in different setups, showing a significant improvement in worst-case performance over currently known alternatives.

show abstract

Bregman Divergence Bounds and Universality Properties of the Logarithmic Loss

Cited by 19 publications

References 64 publications

Nonlinear Canonical Correlation Analysis:A Compressed Representation Approach

Nonlinear Canonical Correlation Analysis:A Compressed Representation Approach

Unified Interpretation of Softmax Cross-Entropy and Negative Sampling: With Case Study for Knowledge Graph Embedding

Robust Universal Inference

Contact Info

Product

Resources

About