George Stamatelis scite author profile

Deep reinforcement learning for active hypothesis testing with heterogeneous agents and cost constraints

Stamatelis¹,

Kalouptsidis²

2023

Preprint

0

View full text Add to dashboard Cite

<p>We consider active hypothesis testing with multiple heterogeneous agents. Each agent has access to its own set of experiments, has different action costs and forms its own beliefs. Additionally, each experiment carries a global cost, and the agents must try to keep the expected cumulative cost below a certain threshold. We study a centralized and a decentralized scenario. Under the centralized scenario, the agents are instructed how to act by a central controller. Under the decentralized scenario they communicate and exchange information over a directed graph. For each scenario we propose separate deep reinforcement learning algorithms based on proximal policy optimization. Solutions to the decentralized problem start from fully decentralised training, and progressively introduce two levels of centralisation during training. We assess the proposed algorithms in an example of anomaly detection over sensor networks, considering three different decentralised communication settings. We infer that all algorithms achieve the required accuracy level considerably faster than a single deep reinforcement learning agent, while satisfying the expected cost constraints when required. Under the assumption that the communication graph is fully connected, the decentralised agents perform just as well, and sometimes better than the centralised controller. Out of all decentralised algorithms, the one that uses a global critic is by far the best performing and can compete with the central controller even when the communication graph is not complete.</p>

show abstract

Active hypothesis testing in unknown environments using recurrent neural networks and model free reinforcement learning

Stamatelis¹,

Kalouptsidis²

2023

Preprint

0

View full text Add to dashboard Cite

A combination of deep reinforcement learning and supervised learning is proposed for the problem of active sequential hypothesis testing in completely unknown environments. We make no assumptions about the prior probability, the action and observation sets, and the observation generating process. Our method can be used in any environment even if it has continuous observations or actions, and performs competitively and sometimes better than the Chernoff test, in both finite and infinite horizon problems, despite not having access to the environment dynamics.

show abstract

Deep reinforcement learning for active hypothesis testing with heterogeneous agents and cost constraints

Stamatelis¹,

Kalouptsidis²

2023

Preprint

0

View full text Add to dashboard Cite

<p>We consider active hypothesis testing with multiple heterogeneous agents. Each agent has access to its own set of experiments, has different action costs and forms its own beliefs. Additionally, each experiment carries a global cost, and the agents must try to keep the expected cumulative cost below a certain threshold. We study a centralized and a decentralized scenario. Under the centralized scenario, the agents are instructed how to act by a central controller. Under the decentralized scenario they communicate and exchange information over a directed graph. For each scenario we propose separate deep reinforcement learning algorithms based on proximal policy optimization. Solutions to the decentralized problem start from fully decentralised training, and progressively introduce two levels of centralisation during training. We assess the proposed algorithms in an example of anomaly detection over sensor networks, considering three different decentralised communication settings. We infer that all algorithms achieve the required accuracy level considerably faster than a single deep reinforcement learning agent, while satisfying the expected cost constraints when required. Under the assumption that the communication graph is fully connected, the decentralised agents perform just as well, and sometimes better than the centralised controller. Out of all decentralised algorithms, the one that uses a global critic is by far the best performing and can compete with the central controller even when the communication graph is not complete.</p>

show abstract