2017
DOI: 10.1109/tsp.2017.2750109
|View full text |Cite
|
Sign up to set email alerts
|

An Online Convex Optimization Approach to Proactive Network Resource Allocation

Abstract: Abstract-Existing approaches to online convex optimization (OCO) make sequential one-slot-ahead decisions, which lead to (possibly adversarial) losses that drive subsequent decision iterates. Their performance is evaluated by the so-called regret that measures the difference of losses between the online solution and the best yet fixed overall solution in hindsight. The present paper deals with online convex optimization involving adversarial loss functions and adversarial constraints, where the constraints are… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
221
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
3
3
2

Relationship

1
7

Authors

Journals

citations
Cited by 202 publications
(221 citation statements)
references
References 40 publications
(93 reference statements)
0
221
0
Order By: Relevance
“…which is a system of nonlinear equations of Q * ∈ R |S|×|X | . Switching the goal from (16) to the fixed point of the Bellman optimality equation (19), a classical yet popular approach is the so-termed Q-learning algorithm [98]: S1) At slot t, select the decision x t by…”
Section: Reinforcement Learning For Interactive Iot Environmentsmentioning
confidence: 99%
See 2 more Smart Citations
“…which is a system of nonlinear equations of Q * ∈ R |S|×|X | . Switching the goal from (16) to the fixed point of the Bellman optimality equation (19), a classical yet popular approach is the so-termed Q-learning algorithm [98]: S1) At slot t, select the decision x t by…”
Section: Reinforcement Learning For Interactive Iot Environmentsmentioning
confidence: 99%
“…In addition to value iteration-based methods such as Qlearning, approaches based on direct policy search such as policy gradients and actor-critic methods are also prevalent nowadays, e.g., [83], [91], [108]. This key idea behind policy gradient is to update the θ-parametrized policy π θ using the gradient of the discounted objective (16) with respect to the policy parameters [91]. Convergence of the policy gradient with deep neural networks or kernel-based function approximators is now better understood than Q-learning, along with the limitations of policy gradient-based methods that arise from their high variance.…”
Section: Reinforcement Learning For Interactive Iot Environmentsmentioning
confidence: 99%
See 1 more Smart Citation
“…Another related line of work concerns online convex optimization with constraints (Mahdavi et al, 2012(Mahdavi et al, , 2013Chen et al, 2017;Neely and Yu, 2017;Chen and Giannakis, 2018). Their setting differs from ours 6 in several important respects.…”
mentioning
confidence: 94%
“…These measures can be related to the rate of change of the function values or minimizers arXiv:1911.05127v1 [math.OC] 12 Nov 2019 over time [26]. Dynamic regret methods for constrained online optimization problems are studied in [27]. All these methods focus on centralized optimization problems.…”
Section: Introductionmentioning
confidence: 99%