Differentiable Top-k Operator with Optimal Transport

Xie, Yujia; Dai, Hanjun; Chen, Minshuo; Dai, Bo; Zhao, Tuo; Zha, Hongyuan; Wei, Wei; Pfister, Tomas

doi:10.48550/arxiv.2002.06504

Cited by 7 publications

(10 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Proof. This result is straightforward combining the Sinkhorn's scaling theorem and Theorem 3 in Xie et al (2020). Specifically, notice the similarity between the lower-level optimization and ( 12),…”

Section: Differentiabilitymentioning

confidence: 72%

A Hypergradient Approach to Robust Regression without Correspondence

Xie,

Mao,

Zuo

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

We consider a regression problem, where the correspondence between input and output data is not available. Such shuffled data is commonly observed in many real world problems. Taking flow cytometry as an example, the measuring instruments are unable to preserve the correspondence between the samples and the measurements. Due to the combinatorial nature, most of existing methods are only applicable when the sample size is small, and limited to linear regression models. To overcome such bottlenecks, we propose a new computational framework -ROBOT-for the shuffled regression problem, which is applicable to large data and complex models. Specifically, we propose to formulate the regression without correspondence as a continuous optimization problem. Then by exploiting the interaction between the regression model and the data correspondence, we propose to develop a hypergradient approach based on differentiable programming techniques. Such a hypergradient approach essentially views the data correspondence as an operator of the regression, and therefore allows us to find a better descent direction for the model parameter by differentiating through the data correspondence. ROBOT is quite general, and can be further extended to the inexact correspondence setting, where the input and output data are not necessarily exactly aligned. Thorough numerical experiments show that ROBOT achieves better performance than existing methods in both linear and nonlinear regression tasks, including real-world applications such as flow cytometry and multi-object tracking.

show abstract

Section: Differentiabilitymentioning

confidence: 72%

A Hypergradient Approach to Robust Regression without Correspondence

Xie,

Mao,

Zuo

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…The same non-differentiability issue of the strict top-k operator also appears in standard classification problems using top-k accuracy. Xie et al (2020) resolves this issue by reducing the top-k selection to an optimal transport problem with regularization to define a soft-top-k operator, which is expressed as a convex program and made differentiable . In this work, we apply the technique of soft-top-k to define a new soft Whittle index policy: Definition 4.3 (Soft Whittle index policy).…”

Section: Differentiability Of Whittle Index Policymentioning

confidence: 99%

“…Soft-top-k operators Xie et al (2020) reduces top-k selection problem to an optimal transport problem that transports a uniform distribution across all input elements with size N to a distribution where the elements with the highest-k values are assigned probability 1 and all the others are assigned 0.…”

Section: Soft-top-k Operatorsmentioning

confidence: 99%

Scalable Decision-Focused Learning in Restless Multi-Armed Bandits with Application to Maternal and Child Health

Wang¹,

Verma²,

Mate³

et al. 2022

Preprint

View full text Add to dashboard Cite

This paper studies restless multi-armed bandit (RMAB) problems with unknown arm transition dynamics but with known correlated arm features. The goal is to learn a model to predict transition dynamics given features, where the Whittle index policy solves the RMAB problems using predicted transitions. However, prior works often learn the model by maximizing the predictive accuracy instead of final RMAB solution quality, causing a mismatch between training and evaluation objectives. To address this shortcoming we propose a novel approach for decisionfocused learning in RMAB that directly trains the predictive model to maximize the Whittle index solution quality. We present three key contributions: (i) we establish the differentiability of the Whittle index policy to support decision-focused learning;(ii) we significantly improve the scalability of previous decision-focused learning approaches in sequential problems; (iii) we apply our algorithm to the service call scheduling problem on a real-world maternal and child health domain. Our algorithm is the first for decision-focused learning in RMAB that scales to largescale real-world problems.

show abstract

“…KKT conditions) the implicit function theorem can be used to compute gradients. This was done for quadratic programs in [2], embedding MaxSAT in neural networks [62], a large class of convex optimization problems [1], smoothed top-k selection via optimal transport [67] and deep equilibrium models [4].…”

Section: End-to-end Trainingmentioning

confidence: 99%

Combinatorial Optimization for Panoptic Segmentation: A Fully Differentiable Approach

Abbas¹,

Swoboda²

2021

Preprint

View full text Add to dashboard Cite

We propose an end-to-end trainable architecture for simultaneous semantic and instance segmentation (a.k.a. panoptic segmentation) consisting of a convolutional neural network and an asymmetric multiway cut problem solver. The latter solves a combinatorial optimization problem that elegantly incorporates semantic and boundary predictions to produce a panoptic labeling. Our formulation allows to directly maximize a smooth surrogate of the panoptic quality metric by backpropagating the gradient through the optimization problem. Experimental evaluation shows improvement of end-to-end learning w.r.t. comparable approaches on Cityscapes and COCO datasets. Overall, our approach shows the utility of using combinatorial optimization in tandem with deep learning in a challenging large scale real-world problem and showcases benefits and insights into training such an architecture end-to-end.Preprint. Under review.

show abstract

Differentiable Top-k Operator with Optimal Transport

Cited by 7 publications

References 25 publications

A Hypergradient Approach to Robust Regression without Correspondence

A Hypergradient Approach to Robust Regression without Correspondence

Scalable Decision-Focused Learning in Restless Multi-Armed Bandits with Application to Maternal and Child Health

Combinatorial Optimization for Panoptic Segmentation: A Fully Differentiable Approach

Contact Info

Product

Resources

About