Abstract. Policy Gradient methods are model-free reinforcement learning algorithms which in recent years have been successfully applied to many real-world problems. Typically, Likelihood Ratio (LR) methods are used to estimate the gradient, but they suffer from high variance due to random exploration at every time step of each training episode. Our solution to this problem is to introduce a state-dependent exploration function (SDE) which during an episode returns the same action for any given state. This results in less variance per episode and faster convergence. SDE also finds solutions overlooked by other methods, and even improves upon state-of-the-art gradient estimators such as Natural Actor-Critic. We systematically derive SDE and apply it to several illustrative toy problems and a challenging robotics simulation task, where SDE greatly outperforms random exploration.
A compliant 2×2 tactile sensor array was developed and investigated for roughness encoding. State of the art cross shape 3D MEMS sensors were integrated with polymeric packaging providing in total 16 sensitive elements to external mechanical stimuli in an area of about 20 mm2, similarly to the SA1 innervation density in humans. Experimental analysis of the bio-inspired tactile sensor array was performed by using ridged surfaces, with spatial periods from 2.6 mm to 4.1 mm, which were indented with regulated 1N normal force and stroked at constant sliding velocity from 15 mm/s to 48 mm/s. A repeatable and expected frequency shift of the sensor outputs depending on the applied stimulus and on its scanning velocity was observed between 3.66 Hz and 18.46 Hz with an overall maximum error of 1.7%. The tactile sensor could also perform contact imaging during static stimulus indentation. The experiments demonstrated the suitability of this approach for the design of a roughness encoding tactile sensor for an artificial fingerpad.
[1] An evaluation is made of ozone profiles retrieved from measurements of the nadir-viewing Global Ozone Monitoring Experiment (GOME) instrument. Currently, four different approaches are used to retrieve ozone profile information from GOME measurements, which differ in the use of external information and a priori constraints. In total nine different algorithms will be evaluated exploiting the optimal estimation (Royal Netherlands Meteorological Institute, Rutherford Appleton Laboratory, University of Bremen, National Oceanic and Atmospheric Administration, Smithsonian Astrophysical Observatory), Phillips-Tikhonov regularization (Space Research Organization Netherlands), neural network (Center for Solar Energy and Hydrogen Research, Tor Vergata University), and data assimilation (German Aerospace Center) approaches. Analysis tools are used to interpret data sets that provide averaging kernels. In the interpretation of these data, the focus is on the vertical resolution, the indicative altitude of the retrieved value, and the fraction of a priori information. The evaluation is completed with a comparison of the results to lidar data from the Network for Detection of Stratospheric Change stations in Andoya (Norway), Observatoire Haute Provence (France), Mauna Loa (Hawaii), Lauder (New Zealand), and Dumont d'Urville (Antarctic) for the years 1997-1999. In total, the comparison involves nearly 1000 ozone profiles and allows the analysis of GOME data measured in different global regions and hence observational circumstances. The main conclusion of this paper is that unambiguous information on the ozone profile can at best be retrieved in the altitude range 15-48 km with a vertical resolution of 10 to 15 km, precision of 5-10%, and a bias up to 5% or 20% depending on the success of recalibration of the input spectra. The sensitivity of retrievals to ozone at lower altitudes varies from scheme to scheme and includes significant influence from a priori assumptions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.