2019
DOI: 10.1109/jiot.2018.2839563
|View full text |Cite
|
Sign up to set email alerts
|

Bandit Convex Optimization for Scalable and Dynamic IoT Management

Abstract: The present paper deals with online convex optimization involving both time-varying loss functions, and timevarying constraints. The loss functions are not fully accessible to the learner, and instead only the function values (a.k.a. bandit feedback) are revealed at queried points. The constraints are revealed after making decisions, and can be instantaneously violated, yet they must be satisfied in the long term. This setting fits nicely the emerging online network tasks such as fog computing in the Internet-… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

2
116
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 117 publications
(118 citation statements)
references
References 28 publications
2
116
0
Order By: Relevance
“…This is in agreement with (14b), (20b), and the theoretical results shown in [39], [40]. From the zoomed figures, we can see that the centralized algorithms in [39], [40] achieve smaller expected dynamic regret and constraint violation than our distributed algorithms, which is reasonable. We can also see that Algorithm 2 achieves smaller expected dynamic regret and constraint Expected dynamic regret 10 4 Algorithm 1 Algorithm 2 [39] (One-Point Sampling) [39] (Two-Point Sampling) [40] 200 Expected constraint violation 10 4 Algorithm 1 Algorithm 2 [39] (One-Point Sampling) [39] (Two-Point Sampling) [40]…”
Section: Numerical Simulationssupporting
confidence: 90%
See 3 more Smart Citations
“…This is in agreement with (14b), (20b), and the theoretical results shown in [39], [40]. From the zoomed figures, we can see that the centralized algorithms in [39], [40] achieve smaller expected dynamic regret and constraint violation than our distributed algorithms, which is reasonable. We can also see that Algorithm 2 achieves smaller expected dynamic regret and constraint Expected dynamic regret 10 4 Algorithm 1 Algorithm 2 [39] (One-Point Sampling) [39] (Two-Point Sampling) [40] 200 Expected constraint violation 10 4 Algorithm 1 Algorithm 2 [39] (One-Point Sampling) [39] (Two-Point Sampling) [40]…”
Section: Numerical Simulationssupporting
confidence: 90%
“…Algorithm 1 can also achieve sublinear expected dynamic regret if V (x * T ) grows sublinearly. In this case, there exists a constant ν ∈ [0, 1), [24], [26]- [29], [39], [41]. Note that these papers did not consider bandit feedback for timevarying inequality constraints or did not even consider timevarying inequality constraints at all.…”
Section: B Expected Regret and Constraint Violation Boundsmentioning
confidence: 99%
See 2 more Smart Citations
“…• Gradient errors: the gradient of the cost κ(i, s) log(1 + z(i, s)) for each exogenous traffic flow is estimated using a multi-point bandit feedback [15], [26]; the estimation error depends on the number of functional evaluations in constructing the proxy of the gradient in (24). • Solution dynamics: at each time step, the channel gain of links are generated by using a complex Gaussian random variable with mean 1 + 1 and a given variance v c for both real and imaginary parts; the transmit power for each node is a Gaussian random variable with mean 1 and a variance v p ; the exogenous traffics are random with mean [0.2, 0.3, 0.3, 0.4, 0.5, 0.2, 0.1, 0.4] and a given variance; and, the cost is perturbed by modifying a t .…”
Section: Illustrative Numerical Resultsmentioning
confidence: 99%