2020
DOI: 10.48550/arxiv.2002.06504
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Differentiable Top-k Operator with Optimal Transport

Abstract: The top-k operation, i.e., finding the k largest or smallest elements from a collection of scores, is an important model component, which is widely used in information retrieval, machine learning, and data mining. However, if the top-k operation is implemented in an algorithmic way, e.g., using bubble algorithm, the resulting model cannot be trained in an end-to-end way using prevalent gradient descent algorithms. This is because these implementations typically involve swapping indices, whose gradient cannot b… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
1

Relationship

1
5

Authors

Journals

citations
Cited by 7 publications
(10 citation statements)
references
References 25 publications
0
10
0
Order By: Relevance
“…Proof. This result is straightforward combining the Sinkhorn's scaling theorem and Theorem 3 in Xie et al (2020). Specifically, notice the similarity between the lower-level optimization and ( 12),…”
Section: Differentiabilitymentioning
confidence: 72%
“…Proof. This result is straightforward combining the Sinkhorn's scaling theorem and Theorem 3 in Xie et al (2020). Specifically, notice the similarity between the lower-level optimization and ( 12),…”
Section: Differentiabilitymentioning
confidence: 72%
“…The same non-differentiability issue of the strict top-k operator also appears in standard classification problems using top-k accuracy. Xie et al (2020) resolves this issue by reducing the top-k selection to an optimal transport problem with regularization to define a soft-top-k operator, which is expressed as a convex program and made differentiable . In this work, we apply the technique of soft-top-k to define a new soft Whittle index policy: Definition 4.3 (Soft Whittle index policy).…”
Section: Differentiability Of Whittle Index Policymentioning
confidence: 99%
“…Soft-top-k operators Xie et al (2020) reduces top-k selection problem to an optimal transport problem that transports a uniform distribution across all input elements with size N to a distribution where the elements with the highest-k values are assigned probability 1 and all the others are assigned 0.…”
Section: Soft-top-k Operatorsmentioning
confidence: 99%
“…KKT conditions) the implicit function theorem can be used to compute gradients. This was done for quadratic programs in [2], embedding MaxSAT in neural networks [62], a large class of convex optimization problems [1], smoothed top-k selection via optimal transport [67] and deep equilibrium models [4].…”
Section: End-to-end Trainingmentioning
confidence: 99%