Distributed on-Line Multi-Agent Optimization Under Uncertainty: Balancing Exploration and Exploitation

Taylor, Matthew E.; Jain, Manish; Tandon, Prateek; Yokoo, Makoto; Tambe, Milind

doi:10.1142/s0219525911003104

Cited by 14 publications

(25 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Furthermore, it is clear that incorporating the current state of the traffic in the decision making process allows for much finer control and response to fluctuations in the general traffic pattern. Also, applying algorithms from the DCEE framework, our previous results concerning the team uncertainty penalty are confirmed in this setting (Taylor et al, 2011), showing again that more coordination among agents is not necessarily beneficial.…”

Section: Discussionsupporting

confidence: 54%

“…Larger k values allows for more joint moves. This often, but not always, increases the total team performance (Taylor et al, 2011). This paper focuses on the class of static estimation (SE) DCEE algorithms.…”

Section: Distributed Coordination Of Exploration and Exploitationmentioning

confidence: 99%

“…K2 performs worse than K1, even though it involves more coordination between the agents. This apparent discrepancy is due to a phenomenon coined the team uncertainty penalty (Taylor et al, 2011), in which agents with few neighbours actually achieve higher performance with lower levels of coordination (e.g. K1), while agents with many neighbours can achieve higher performance with higher levels of coordination (e.g.…”

Section: Light Versus Heavy Trafficmentioning

confidence: 99%

“…However, the DCOP framework requires that the reward of every action combination be known a priori, making it difficult to handle non-stationary traffic distributions. In contrast, our previous work has extended the DCOP framework to the distributed coordination of exploration and exploitation (DCEE) (Taylor, Jain, Tandon, Yokoo, & Tambe, 2011) framework. In order to address the importance of dynamic and unknown rewards, DCEE algorithms take a multi-agent approach towards balancing exploiting known good configurations with exploration of novel action combinations to attempt to find better rewards.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Distributed learning and multi-objectivity in traffic light control

2014

Self Cite

View full text Add to dashboard Cite

Traffic jams and suboptimal traffic flows are ubiquitous in modern societies, and they create enormous economic losses each year. Delays at traffic lights alone account for roughly 10% of all delays in US traffic. As most traffic light scheduling systems currently in use are static, set up by human experts rather than being adaptive, the interest in machine learning approaches to this problem has increased in recent years. Reinforcement learning (RL) approaches are often used in these studies, as they require little pre-existing knowledge about traffic flows. Distributed constraint optimisation approaches (DCOP) have also been shown to be successful, but are limited to cases where the traffic flows are known. The distributed coordination of exploration and exploitation (DCEE) framework was recently proposed to introduce learning in the DCOP framework. In this paper, we present a study of DCEE and RL techniques in a complex simulator, illustrating the particular advantages of each, comparing them against standard isolated traffic actuated signals. We analyse how learning and coordination behave under different traffic conditions, and discuss the multi-objective nature of the problem. Finally we evaluate several alternative reward signals in the best performing approach, some of these taking advantage of the correlation between the problem-inherent objectives to improve performance.

show abstract

Section: Discussionsupporting

confidence: 54%

Section: Distributed Coordination Of Exploration and Exploitationmentioning

confidence: 99%

Section: Light Versus Heavy Trafficmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Distributed learning and multi-objectivity in traffic light control

2014

Self Cite

View full text Add to dashboard Cite

show abstract

“…Hence the uncertainty must be somehow captured into the DCOP model and dealt with in the solution technique. There are a number of approaches that tackle these issues, in particular, we refer the interest reader to [66] for approaches that model uncertainty of the reward and attempts to nd optimal solution considering such uncertainty, and to [67,68] for approaches that aim at learning unknown rewards of agents' joint actions.…”

Section: 4mentioning

confidence: 99%