Action Branching Architectures for Deep Reinforcement Learning

Tavakoli, Arash; Pardo, Fabio; Kormushev, Petar

doi:10.1609/aaai.v32i1.11798

Cited by 116 publications

(46 citation statements)

References 20 publications

(21 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To tackle this problem, the BDQ algorithm in (Tavakoli et al, 2018) provides a special network structure, namely, branching dueling deep Q-network (or branching network for short), that allows the number of outputs of deep Q-networks to linearly increases with the number of components as illustrated in figure 2.…”

Section: Branching Dueling Q-learning (Bdq)mentioning

confidence: 99%

“…To address these issues, in this paper, we customize WQMIX to effectively optimize maintenance decisions of large-scale multi-component systems for the fully observable setting. In particular, separate agent networks are replaced by a single branching dueling network (branching network) (Tavakoli, Pardo, & Kormushev, 2018) to take advantage of the fully observable setting. The branching structure allows achieving a linear increase in the size of deep Q-networks's output layer to avoid the cure of dimentionality.…”

Section: Introductionmentioning

confidence: 99%

“…Firstly, we customize WQMIX algorithm specifically for the fully observable setting . Secondly, we conduct a comparison study to benchmark the performance of the customized algorithm, the branching dueling deep Q-learning (Tavakoli et al, 2018) and a threshold-based policy when they are used to optimize maintenance actions of large-scale systems.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Weighted-QMIX-based Optimization for Maintenance Decision-making of Multi-component Systems

et al. 2022

View full text Add to dashboard Cite

It is well-known that maintenance decision optimization for multi-component systems faces the curse of dimensionality. Specifically, the number of decision variables needed to be optimized grows exponentially in the number of components causing computational expensive for optimization algorithms. To address this issue, we customize a multi-agent deep reinforcement learning algorithm, namely Weighted QMIX, in the case where system states can be fully observed to obtain cost-effective policies. A case study is conducted on a 13- component system to examine the effectiveness of the customized algorithm. The obtained results confirmed its performance.

show abstract

Section: Branching Dueling Q-learning (Bdq)mentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Weighted-QMIX-based Optimization for Maintenance Decision-making of Multi-component Systems

et al. 2022

View full text Add to dashboard Cite

show abstract

“…In order to take the fourth expected contribution into account, we customize a MADRL algorithm, namely WQMIX (Rashid, Farquhar, Peng, & Whiteson, 2020), in the case where system states can be fully observed to obtain cost-effective policies. Indeed, the algorithm takes advantage of the branching dueling network architecture (Tavakoli, Pardo, & Kormushev, 2018) to allow achieving linear increase in the size of the output layer of deep Q-networks when the number of system components grows and the monotonic decomposition scheme for joint action-value functions (Rashid et al, 2020) to enable maintenance decision-making consistency at component and system level.…”

Section: Current Workmentioning

confidence: 99%

Artificial-Intelligence-Based Maintenance Scheduling for Complex Systems with Multiple Dependencies

2022

View full text Add to dashboard Cite

Maintenance planning for complex systems has still been a challenging problem. Firstly, integrating multiple dependency types into maintenance models makes them more realistic, however, more complicated to solve and analyze. Secondly, the number of maintenance decision variables needed to be optimized increases rapidly in the number of components, causing computational expensive for optimization algorithms. To face these issues, this thesis aims to incorporate multiple kinds of dependencies into maintenance models as well as to take advantage of recent advances in artificial intelligence field to effectively optimize maintenance polices for large-scale multi-component systems.

show abstract

“…The low-level problem is considered as a discrete control problem with two action dimensions: price and quantity. We utilize the Branching Dueling Q-Network (Tavakoli, Pardo, and Kormushev 2018). Formally, we have two action dimensions with |p l | = n p discrete relative price levels and |q l | = n q discrete quantity proportions.…”

Section: Low-level Rl With Action Branchingmentioning

confidence: 99%

Commission Fee is not Enough: A Hierarchical Reinforced Framework for Portfolio Management

Wang

Wei

et al. 2021

AAAI

View full text Add to dashboard Cite

Portfolio management via reinforcement learning is at the forefront of fintech research, which explores how to optimally reallocate a fund into different financial assets over the long term by trial-and-error. Existing methods are impractical since they usually assume each reallocation can be finished immediately and thus ignoring the price slippage as part of the trading cost. To address these issues, we propose a hierarchical reinforced stock trading system for portfolio management (HRPM). Concretely, we decompose the trading process into a hierarchy of portfolio management over trade execution and train the corresponding policies. The high-level policy gives portfolio weights at a lower frequency to maximize the long-term profit and invokes the low-level policy to sell or buy the corresponding shares within a short time window at a higher frequency to minimize the trading cost. We train two levels of policies via a pre-training scheme and an iterative training scheme for data efficiency. Extensive experimental results in the U.S. market and the China market demonstrate that HRPM achieves significant improvement against many state-of-the-art approaches.

show abstract

Action Branching Architectures for Deep Reinforcement Learning

Cited by 116 publications

References 20 publications

Weighted-QMIX-based Optimization for Maintenance Decision-making of Multi-component Systems

Weighted-QMIX-based Optimization for Maintenance Decision-making of Multi-component Systems

Artificial-Intelligence-Based Maintenance Scheduling for Complex Systems with Multiple Dependencies

Commission Fee is not Enough: A Hierarchical Reinforced Framework for Portfolio Management

Contact Info

Product

Resources

About