Vicenç Gómez scite author profile

We reformulate a class of non-linear stochastic optimal control problems introduced by Todorov (in Advances in Neural Information Processing Systems, vol. 19, pp. 1369Systems, vol. 19, pp. -1376Systems, vol. 19, pp. , 2007) as a Kullback-Leibler (KL) minimization problem. As a result, the optimal control computation reduces to an inference computation and approximate inference methods can be applied to efficiently compute approximate optimal controls. We show how this KL control theory contains the path integral control method as a special case. We provide an example of a block stacking task and a multi-agent cooperative game where we demonstrate how approximate inference can be successfully applied to instances that are too complex for exact computation. We discuss the relation of the KL control approach to other inference approaches to control.

show abstract

Statistical analysis of the social network and discussion threads in slashdot

Gómez

2008

View full text Add to dashboard Cite

We analyze the social network emerging from the user comment activity on the website Slashdot. The network presents common features of traditional social networks such as a giant component, small average path length and high clustering, but differs from them showing moderate reciprocity and neutral assortativity by degree. Using Kolmogorov-Smirnov statistical tests, we show that the degree distributions are better explained by log-normal instead of power-law distributions. We also study the structure of discussion threads using an intuitive radial tree representation. Threads show strong heterogeneity and self-similarity throughout the different nesting levels of a conversation. We use these results to propose a simple measure to evaluate the degree of controversy provoked by a post.

show abstract

A unified view of entropy-regularized Markov decision processes

Neu¹,

Jönsson²,

Gómez³

2017

Preprint

121

View full text Add to dashboard Cite

We propose a general framework for entropy-regularized average-reward reinforcement learning in Markov decision processes (MDPs). Our approach is based on extending the linear-programming formulation of policy optimization in MDPs to accommodate convex regularization functions. Our key result is showing that using the conditional entropy of the joint state-action distributions as regularization yields a dual optimization problem closely resembling the Bellman optimality equations. This result enables us to formalize a number of state-of-the-art entropy-regularized reinforcement learning algorithms as approximate variants of Mirror Descent or Dual Averaging, and thus to argue about the convergence properties of these methods. In particular, we show that the exact version of the TRPO algorithm of Schulman et al. (2015) actually converges to the optimal policy, while the entropy-regularized policy gradient methods of Mnih et al. (2016) may fail to converge to a fixed point. Finally, we illustrate empirically the effects of using various regularization techniques on learning performance in a simple reinforcement learning setup.

show abstract

Modeling the structure and evolution of discussion cascades

Gómez

Kappen

Kaltenbrunner

2011

View full text Add to dashboard Cite

We analyze the structure and evolution of discussion cascades in four popular websites: Slashdot, Barrapunto, Meneame and Wikipedia. Despite the big heterogeneities between these sites, a preferential attachment (PA) model with bias to the root can capture the temporal evolution of the observed trees and many of their statistical properties, namely, probability distributions of the branching factors (degrees), subtree sizes and certain correlations. The parameters of the model are learned efficiently using a novel maximum likelihood estimation scheme for PA and provide a figurative interpretation about the communication habits and the resulting discussion cascades on the four different websites.

show abstract

A likelihood-based framework for the analysis of discussion threads

et al. 2012

View full text Add to dashboard Cite

Online discussion threads are conversational cascades in the form of posted messages that can be generally found in social systems that comprise many-to-many interaction such as blogs, news aggregators or bulletin board systems. We propose a framework based on generative models of growing trees to analyse the structure and evolution of discussion threads. We consider the growth of a discussion to be determined by an interplay between popularity, novelty and a trend (or bias) to reply to the thread originator. The relevance of these features is estimated using a full likelihood approach and allows to characterise the habits and communication patterns of a given platform and/or community. We apply the proposed framework on four popular websites: Slashdot, Barrapunto (a Spanish version of Slashdot), Meneame (a Spanish Digg-clone) and the article discussion pages of the English Wikipedia. Our results provide significant insight into understanding how discussion cascades grow and have potential applications in broader contexts such as community management or design of communication platforms.

show abstract

Policy Search for Path Integral Control

Gómez

Kappen

Peters

et al. 2014

View full text Add to dashboard Cite

Abstract. Path integral (PI) control defines a general class of control problems for which the optimal control computation is equivalent to an inference problem that can be solved by evaluation of a path integral over state trajectories. However, this potential is mostly unused in real-world problems because of two main limitations: first, current approaches can typically only be applied to learn openloop controllers and second, current sampling procedures are inefficient and not scalable to high dimensional systems. We introduce the efficient Path Integral Relative-Entropy Policy Search (PI-REPS) algorithm for learning feedback policies with PI control. Our algorithm is inspired by information theoretic policy updates that are often used in policy search. We use these updates to approximate the state trajectory distribution that is known to be optimal from the PI control theory. Our approach allows for a principled treatment of different sampling distributions and can be used to estimate many types of parametric or non-parametric feedback controllers. We show that PI-REPS significantly outperforms current methods and is able to solve tasks that are out of reach for current methods.

show abstract

On the use of interaction error potentials for adaptive brain computer interfaces

et al. 2011

View full text Add to dashboard Cite

Description and Prediction of Slashdot Activity

Kaltenbrunner¹,

Gómez²,

López³

2007

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Vicenç Gómez

Optimal control as a graphical model inference problem

Statistical analysis of the social network and discussion threads in slashdot

A unified view of entropy-regularized Markov decision processes

Modeling the structure and evolution of discussion cascades

A likelihood-based framework for the analysis of discussion threads

Policy Search for Path Integral Control

On the use of interaction error potentials for adaptive brain computer interfaces

Description and Prediction of Slashdot Activity

Contact Info

Product

Resources

About