Counterfactual Critic Multi-Agent Training for Scene Graph Generation

Chen, Long; Zhang, Hanwang; Xiao, Jun; He, Xiangnan; Pu, Shiliang; Chang, Shih‐Fu

doi:10.1109/iccv.2019.00471

Cited by 150 publications

(111 citation statements)

References 60 publications

Supporting

Mentioning

105

Contrasting

Order By: Relevance

“…This task is to produce graph representations of images in terms of objects and their relationships. Scene graphs have been shown effective in boosting several vision-language tasks [14,25,28,5]. To the best of our knowledge, we are the first to design neural module networks that can reason over scene graphs.…”

Section: Related Workmentioning

confidence: 99%

Explainable and Explicit Visual Reasoning Over Scene Graphs

Shi

Zhang

2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Self Cite

203

119

View full text Add to dashboard Cite

We aim to dismantle the prevalent black-box neural architectures used in complex visual reasoning tasks, into the proposed eXplainable and eXplicit Neural Modules (XNMs), which advance beyond existing neural module networks towards using scene graphs -objects as nodes and the pairwise relationships as edges -for explainable and explicit reasoning with structured knowledge. XNMs allow us to pay more attention to teach machines how to "think", regardless of what they "look". As we will show in the paper, by using scene graphs as an inductive bias, 1) we can design XNMs in a concise and flexible fashion, i.e., XNMs merely consist of 4 meta-types, which significantly reduce the number of parameters by 10 to 100 times, and 2) we can explicitly trace the reasoning-flow in terms of graph attentions. XNMs are so generic that they support a wide range of scene graph implementations with various qualities. For example, when the graphs are detected perfectly, XNMs achieve 100% accuracy on both CLEVR and CLEVR CoGenT, establishing an empirical performance upper-bound for visual reasoning; when the graphs are noisily detected from real-world images, XNMs are still robust to achieve a competitive 67.5% accuracy on VQAv2.0, surpassing the popular bag-of-objects attention models without graph structures. * The work was done when Jiaxin Shi was an intern at Nanyang Technological University.

show abstract

Section: Related Workmentioning

confidence: 99%

Explainable and Explicit Visual Reasoning Over Scene Graphs

Shi

Zhang

2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Self Cite

203

119

View full text Add to dashboard Cite

show abstract

“…To the best of our knowledge, there are only two exceptions among all NLVL models: RWM and SM-RL (Wang et al, 2019), which are not under either top-down or bottomup frameworks. They both formulate NLVL as a sequential decision making problem, solved by reinforcement learning, e.g., actor critic (Chen et al, 2019b). The action space for each step is a set of handcraft-designed temporal box transformations.…”

Section: Natural Language Video Localizationmentioning

confidence: 99%

DEBUG: A Dense Bottom-Up Grounding Approach for Natural Language Video Localization

Lu¹,

Chen²,

Tan³

et al. 2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

Self Cite

118

103

View full text Add to dashboard Cite

In this paper, we focus on natural language video localization: localizing (i.e., grounding) a natural language description in a long and untrimmed video sequence. All currently published models for addressing this problem can be categorized into two types: (i) top-down approach: it does classification and regression for a set of pre-cut video segment candidates; (ii) bottom-up approach: it directly predicts probabilities for each video frame as the temporal boundaries (i.e., start and end time point). However, both two approaches suffer several limitations: the former is computationintensive for densely placed candidates, while the latter has trailed the performance of the top-down counterpart thus far. To this end, we propose a novel dense bottom-up framework: DEnse Bottom-Up Grounding (DEBUG). DE-BUG regards all frames falling in the ground truth segment as foreground, and each foreground frame regresses the unique distances from its location to bi-directional ground truth boundaries. Extensive experiments on three challenging benchmarks (TACoS, Charades-STA, and ActivityNet Captions) show that DE-BUG is able to match the speed of bottom-up models while surpassing the performance of the state-of-the-art top-down models. * Chujie Lu and Long Chen are co-first authors with equal contributions. † Jun Xiao is the corresponding author.Language Query: People are scrubbing the ice in front of a ball. 89.7s 94.2s

show abstract

“…However, DL models which have no interaction with the market has a natural disadvantage in decision making problem like PM. Reinforcement learning algorithms have been proved effective in decision making problems in recent years and deep reinforcement learning (DRL) (Chen et al 2019), the integration of DL and RL, is widely used in the financial field. For instance, (Almahdi and Yang 2017) proposed a recurrent reinforcement learning (RRL) method, with a coherent riskadjusted performance objective function named the Calmar ratio, to obtain both buy and sell signals and asset allocation weights.…”

Section: Related Workmentioning

confidence: 99%

Reinforcement-Learning Based Portfolio Management with Augmented Asset Movement Prediction States

Pei

Wang

et al. 2020

AAAI

View full text Add to dashboard Cite

Portfolio management (PM) is a fundamental financial planning task that aims to achieve investment goals such as maximal profits or minimal risks. Its decision process involves continuous derivation of valuable information from various data sources and sequential decision optimization, which is a prospective research direction for reinforcement learning (RL). In this paper, we propose SARL, a novel State-Augmented RL framework for PM. Our framework aims to address two unique challenges in financial PM: (1) data heterogeneity – the collected information for each asset is usually diverse, noisy and imbalanced (e.g., news articles); and (2) environment uncertainty – the financial market is versatile and non-stationary. To incorporate heterogeneous data and enhance robustness against environment uncertainty, our SARL augments the asset information with their price movement prediction as additional states, where the prediction can be solely based on financial data (e.g., asset prices) or derived from alternative sources such as news. Experiments on two real-world datasets, (i) Bitcoin market and (ii) HighTech stock market with 7-year Reuters news articles, validate the effectiveness of SARL over existing PM approaches, both in terms of accumulated profits and risk-adjusted profits. Moreover, extensive simulations are conducted to demonstrate the importance of our proposed state augmentation, providing new insights and boosting performance significantly over standard RL-based PM method and other baselines.

show abstract

Counterfactual Critic Multi-Agent Training for Scene Graph Generation

Cited by 150 publications

References 60 publications

Explainable and Explicit Visual Reasoning Over Scene Graphs

Explainable and Explicit Visual Reasoning Over Scene Graphs

DEBUG: A Dense Bottom-Up Grounding Approach for Natural Language Video Localization

Reinforcement-Learning Based Portfolio Management with Augmented Asset Movement Prediction States

Contact Info

Product

Resources

About