Omar Darwiche Domingues scite author profile

Omar Darwiche Domingues

5Publications

34Citation Statements Received

79Citation Statements Given

How they've been cited

How they cite others

Affiliations

Universidade Estadual de Campinas, French Institute for Research in Computer Science and Automation

Publications

Order By: Most citations

Adaptive Reward-Free Exploration

Kaufmann¹,

Ménard²,

Domingues³

et al. 2020

Preprint

View full text Add to dashboard Cite

Reward-free exploration is a reinforcement learning setting recently studied by Jin et al. [17], who address it by running several algorithms with regret guarantees in parallel. In our work, we instead propose a more adaptive approach for reward-free exploration which directly reduces upper bounds on the maximum MDP estimation error. We show that, interestingly, our reward-free UCRL algorithm can be seen as a variant of an algorithm of Fiechter from 1994 [11], originally proposed for a different objective that we call best-policy identification. We prove that RF-UCRL needs O (SAH 4 /ε 2 ) log(1/δ) episodes to output, with probability 1 − δ, an ε-approximation of the optimal policy for any reward function. We empirically compare it to oracle strategies using a generative model. 1 We use the shorthand [n] = {1, . . . , n} for every integer n ∈ N * Preprint. Under review.

show abstract

UCB Momentum Q-learning: Correcting the bias without forgetting

Ménard¹,

Domingues²,

Shang³

et al. 2021

Preprint

View full text Add to dashboard Cite

We propose UCBMQ, Upper Confidence Bound Momentum Q-learning, a new algorithm for reinforcement learning in tabular and possibly stagedependent, episodic Markov decision process. UCBMQ is based on Q-learning where we add a momentum term and rely on the principle of optimism in face of uncertainty to deal with exploration. Our new technical ingredient of UCBMQ is the use of momentum to correct the bias that Q-learning suffers while, at the same time, limiting the impact it has on the the second-order term of the regret. For UCBMQ, we are able to guarantee a regret of at most O( √ H 3 SAT + H 4 SA) where H is the length of an episode, S the number of states, A the number of actions, T the number of episodes and ignoring terms in poly log(SAHT ). Notably, UCBMQ is the first algorithm that simultaneously matches the lower bound of Ω( √ H 3 SAT ) for large enough T and has a second-order term (with respect to the horizon T ) that scales only linearly with the number of states S.1 It is the same reason why there is an extra factor S in the first order term of the bound of UCRL algorithm by Jaksch et al. (2010). This factor is "pushed" to the second-order term by the improved analysis of Azar et al. (2017).

show abstract

Achievable Rates of Space-Division Multiplexed Submarine Links Subject to Nonlinearities and Power Feed Constraints

Domingues

Mello

Silva

et al. 2017

J. Lightwave Technol.

View full text Add to dashboard Cite

A Kernel-Based Approach to Non-Stationary Reinforcement Learning in Metric Spaces

Domingues¹,

Ménard²,

Pirotta³

et al. 2020

Preprint

View full text Add to dashboard Cite

In this work, we propose KeRNS: an algorithm for episodic reinforcement learning in non-stationary Markov Decision Processes (MDPs) whose state-action set is endowed with a metric. Using a non-parametric model of the MDP built with time-dependent kernels, we prove a regret bound that scales with the covering dimension of the state-action space and the total variation of the MDP with time, which quantifies its level of non-stationarity. Our method generalizes previous approaches based on sliding windows and exponential discounting used to handle changing environments. We further propose a practical implementation of KeRNS, we analyze its regret and validate it experimentally.1 meaning Kernel-based Reinforcement Learning in Non-Stationary environments.Preprint. Under review.

show abstract

Multifractal Analysis for Cumulant-Based Epileptic Seizure Detection in Eeg Time Series

Domingues

Ciuciu

Rocca

et al. 2019

View full text Add to dashboard Cite

Multifractal analysis allows us to study scale invariance and fluctuations of the pointwise regularity of time series. A theoretically well grounded multifractal formalism, based on wavelet leaders, was applied to electroencephalography (EEG) time series measured in healthy volunteers and epilepsy patients, provided by the University of Bonn. We show that the multifractal spectrum during a seizure indicates a lower global regularity when compared to non-seizure data and that multifractal features, combined with few baseline features, can be used to train a supervised learning algorithm to discriminate well above chance ictal (i.e. seizure) versus healthy and interictal epochs (97 %) and healthy controls versus patients (92 %).

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.