2018 IEEE International Conference on Robotics and Automation (ICRA) 2018
DOI: 10.1109/icra.2018.8460781
|View full text |Cite
|
Sign up to set email alerts
|

Real-Time Learning of Efficient Lift Generation on a Dynamically Scaled Flapping Wing Using Policy Search

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
12
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
5

Relationship

2
3

Authors

Journals

citations
Cited by 9 publications
(12 citation statements)
references
References 28 publications
0
12
0
Order By: Relevance
“…The model was trained using data acquired from a dynamically scaled robotic wing [21,22,53] (figure 2 b ). In this work, we aimed to develop a general predictive model valid for a broad range of different wing motion kinematics while providing a novel and fundamental perspective towards flapping flight—characterized by fast, reciprocal, and 3DoF motion of the aerodynamic surface.…”
Section: Prssm As a Predictive Model Of Flapping Wing Aerodynamicsmentioning
confidence: 99%
“…The model was trained using data acquired from a dynamically scaled robotic wing [21,22,53] (figure 2 b ). In this work, we aimed to develop a general predictive model valid for a broad range of different wing motion kinematics while providing a novel and fundamental perspective towards flapping flight—characterized by fast, reciprocal, and 3DoF motion of the aerodynamic surface.…”
Section: Prssm As a Predictive Model Of Flapping Wing Aerodynamicsmentioning
confidence: 99%
“…the kinematic parameters in the fin motion trajectories that are need to be optimised. The policy gradient methods commonly maximise the expected return Jπ by using the policy gradient ascent, with a policy update: θfalse←θ+γπJπ where γ is the learning rate, πJπ is the policy gradient and θ is the policy parameter vector [32]. In episode‐based algorithms, policy gradient is estimated using the total cumulative reward for several rollouts (trials) that share the same policy.…”
Section: Methodsmentioning
confidence: 99%
“…The PEPG algorithm, on the other hand, learns distributions of policy parameters rather than themselves, shifting the exploration from action space to parameter space, reducing the variance. In turn, the reliability of the algorithm increases significantly [32], while quality and speed of convergence is improved. Particularly, in PEPG, μ and σ (Gaussian distribution assumed) are introduced, which represent mean and standard deviation of policy parameter vector θ, and θNfalse(μ,Iσ2false), and these distribution parameters are updated using the policy gradients.…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations