2022
DOI: 10.1103/physreva.106.033124
|View full text |Cite
|
Sign up to set email alerts
|

Optimizing measurement-based cooling by reinforcement learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 51 publications
0
2
0
Order By: Relevance
“…The algorithm of DPPO is a distributed variant of proximal policy optimization (PPO) [42], in which an updatable policy as an actor is trained to choose the comparatively optimized or correct actions toward the final goal and a critic is trained to evaluate quantitatively if the actions chosen by the policy should be encouraged. In a conventional PPO that was employed in optimizing conditional-measurement-based cooling by reinforcement learning [26], there are two policies and one critic. All of them are constructed by neural networks with individual sets of parameters.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…The algorithm of DPPO is a distributed variant of proximal policy optimization (PPO) [42], in which an updatable policy as an actor is trained to choose the comparatively optimized or correct actions toward the final goal and a critic is trained to evaluate quantitatively if the actions chosen by the policy should be encouraged. In a conventional PPO that was employed in optimizing conditional-measurement-based cooling by reinforcement learning [26], there are two policies and one critic. All of them are constructed by neural networks with individual sets of parameters.…”
Section: Discussionmentioning
confidence: 99%
“…To improve the success probability, a straightforward idea is to reduce the number of projections. Approaches include cooling by one-shot measurement [25], cooling by hybrid measurements with optimized measurement time spacings [26], and cooling with random time spacings [16,19]. An alternative yet surprisingly unexplored idea might be purifying the target system before performing the projective measurements.…”
Section: Introductionmentioning
confidence: 99%