2011
DOI: 10.1137/080718395
|View full text |Cite
|
Sign up to set email alerts
|

A Generalization of the Averaging Procedure: The Use of Two-Time-Scale Algorithms

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
26
0

Year Published

2014
2014
2023
2023

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 22 publications
(27 citation statements)
references
References 22 publications
1
26
0
Order By: Relevance
“…At epoch k, we compute a new iterate µ k+1 by subtracting from the current iterate µ k the product of the Hessian inverse and the gradient of the functionR k+1 (µ k ). For the empirical dual loss functionR k defined in (22), we define the gradient ∇R k (µ) and Hessian ∇ 2R k (µ). The new approximate solution µ k+1 is then found from current approximate solution µ k using the Newton update…”
Section: A Learning Via Newton's Methodsmentioning
confidence: 99%
See 4 more Smart Citations
“…At epoch k, we compute a new iterate µ k+1 by subtracting from the current iterate µ k the product of the Hessian inverse and the gradient of the functionR k+1 (µ k ). For the empirical dual loss functionR k defined in (22), we define the gradient ∇R k (µ) and Hessian ∇ 2R k (µ). The new approximate solution µ k+1 is then found from current approximate solution µ k using the Newton update…”
Section: A Learning Via Newton's Methodsmentioning
confidence: 99%
“…Remark 3 Observe in the text of Proposition 1 that we definẽ R * k to be the optimal point of the loss functionL k (µ k ) regularized with a standard log barrier − log(µ k ), rather than the thresholded barrier − log (µ k ) used in the definition in (22). Indeed, using the thresholded barrier does not explicitly enforce nonnegativity for values smaller than .…”
Section: Erm Over Non-stationary Channelmentioning
confidence: 99%
See 3 more Smart Citations