This paper designs novel nonparametric Bellman mappings in reproducing
kernel Hilbert spaces (RKHSs) for reinforcement learning (RL). The
proposed mappings benefit from the rich approximating properties of
RKHSs, adopt no assumptions on the statistics of the data owing to their
nonparametric nature, require no knowledge on transition probabilities
of Markov decision processes, and may operate without any training data.
Moreover, they allow for sampling on-the-fly via the design of
trajectory samples, re-use past test data via experience replay, effect
dimensionality reduction by random Fourier features, and enable
computationally lightweight operations to fit into efficient online or
time-adaptive learning. The paper offers also a variational framework to
design the free parameters of the proposed Bellman mappings, and shows
that appropriate choices of those parameters yield several popular
Bellman-mapping designs. As an application, the proposed mappings are
employed to offer a novel solution to the problem of countering outliers
in adaptive filtering. More specifically, with no prior information on
the statistics of the outliers and no training data, a policy-iteration
algorithm is introduced to select online, per time instance, the
“optimal” coefficient p in the least-mean-p-power-error method.
Numerical tests on synthetic data showcase, in most of the cases, the
superior performance of the proposed solution over several RL and non-RL
schemes.