Structured Prediction via Learning to Search under Bandit Feedback

Sharaf, Amr; Daumé, Hal

doi:10.18653/v1/w17-4304

Cited by 9 publications

(9 citation statements)

References 16 publications

(14 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, the analysis by Choshen et al (2020) missed a few crucial aspects of RL that have led to empirical success in previous works: First, variance reduction techniques such as the average reward baseline were already proposed with the original Policy Gradient by Williams (1992), and proved effective for NMT Nguyen et al, 2017). Second, the exploration-exploitation trade-off can be controlled by modifying the sampling function (Sharaf and Daumé III, 2017), which in turn influences the peakiness.…”

Section: Introductionmentioning

confidence: 99%

Revisiting the Weaknesses of Reinforcement Learning for Neural Machine Translation

Kiegeland¹,

Kreutzer²

2021

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

View full text Add to dashboard Cite

Policy gradient algorithms have found wide adoption in NLP, but have recently become subject to criticism, doubting their suitability for NMT. Choshen et al. ( 2020) identify multiple weaknesses and suspect that their success is determined by the shape of output distributions rather than the reward. In this paper, we revisit these claims and study them under a wider range of configurations. Our experiments on in-domain and cross-domain adaptation reveal the importance of exploration and reward scaling, and provide empirical counterevidence to these claims.

show abstract

Section: Introductionmentioning

confidence: 99%

Revisiting the Weaknesses of Reinforcement Learning for Neural Machine Translation

Kiegeland¹,

Kreutzer²

2021

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

View full text Add to dashboard Cite

show abstract

“…Hence, they naturally generate an ordered sequence of frames, while the attention mechanism fuses the multi-modal information to select the next best frame satisfying diversity, query-relevance and visual coherence (Figure 3). We train the Pointer Network in our model using reinforcement learning, as it is useful for tasks with limited labeled data [4,5,19,30,36,40,56], as in the case of QAMVS.…”

Section: Pointer Networkmentioning

confidence: 99%

DeepQAMVS: Query-Aware Hierarchical Pointer Networks for Multi-Video Summarization

Messaoud

Lourentzou

Boughoula

et al. 2021

Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval

View full text Add to dashboard Cite

The recent growth of web video sharing platforms has increased the demand for systems that can efficiently browse, retrieve and summarize video content. Query-aware multi-video summarization is a promising technique that caters to this demand. In this work, we introduce a novel Query-Aware Hierarchical Pointer Network for Multi-Video Summarization, termed DeepQAMVS, that jointly optimizes multiple criteria: (1) conciseness, (2) representativeness of important query-relevant events and (3) chronological soundness. We design a hierarchical attention model that factorizes over three distributions, each collecting evidence from a different modality, followed by a pointer network that selects frames to include in the summary. DeepQAMVS is trained with reinforcement learning, incorporating rewards that capture representativeness, diversity, query-adaptability and temporal coherence. We achieve state-ofthe-art results on the MVS1K dataset, with inference time scaling linearly with the number of input video frames.

show abstract

“…Imitation learning algorithms are a great fit for training agents in simulated environments: access to ground-truth information about the environments allows optimal actions to be computed in many situations. The "teacher" in standard imitation learning algorithms (Daumé III et al, 2009;Ross et al, 2011;Ross and Bagnell, 2014;Chang et al, 2015;Sun et al, 2017;Sharaf and Daumé III, 2017) et al, 2019) models an advisor who is always present to help but speaks simple, templated language. CVDN (Thomason et al, 2019b) contains natural conversations in which a human assistant aids another human in navigation tasks but offers limited language interaction simulation, as language assistance is not available when the agent deviates from the collected trajectories and tasks.…”

Section: Related Workmentioning

confidence: 99%

Help, Anna! Visual Navigation with Natural Multimodal Assistance via Retrospective Curiosity-Encouraging Imitation Learning

Nguyen¹,

Daumé²

2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

Self Cite

View full text Add to dashboard Cite

This paper is the arXiv version of the paper that appears in the proceedings of EMNLP 2019. The content of the main paper is the exactly same as in the proceedings (modulo citation updates). However, the evaluation method used to obtain the results in the main paper unfortunately induces non-deterministic agent behavior, which makes comparisons difficult. We provide additional results herein obtained via a deterministic evaluation scheme in Appendix G. All conclusions and qualitative claims made in the main paper are unaffected by this change of evaluation scheme, and still hold on the new results. We strongly recommend future work reference results in Appendix G when comparing with our methods. AbstractMobile agents that can leverage help from humans can potentially accomplish more complex tasks than they could entirely on their own. We develop "Help, Anna!" (HANNA), an interactive photo-realistic simulator in which an agent fulfills object-finding tasks by requesting and interpreting natural languageand-vision assistance. An agent solving tasks in a HANNA environment can leverage simulated human assistants, called ANNA (Automatic Natural Navigation Assistants), which, upon request, provide natural language and visual instructions to direct the agent towards the goals. To address the HANNA problem, we develop a memory-augmented neural agent that hierarchically models multiple levels of decision-making, and an imitation learning algorithm that teaches the agent to avoid repeating past mistakes while simultaneously predicting its own chances of making future progress. Empirically, our approach is able to ask for help more effectively than competitive baselines and, thus, attains higher task success rate on both previously seen and previously unseen environments. We publicly release code and data at https://github. com/khanhptnk/hanna . , et al. 2018a. On evaluation of embodied navigation agents. arXiv preprint arXiv:1807.06757. van den Hengel. 2018b. Visionand-language navigation: Interpreting visuallygrounded navigation instructions in real environments. In She. 2016. Collaborative language grounding toward situated human-robot dialogue. AI Magazine, 37(4):32-45.

show abstract

Structured Prediction via Learning to Search under Bandit Feedback

Cited by 9 publications

References 16 publications

Revisiting the Weaknesses of Reinforcement Learning for Neural Machine Translation

Revisiting the Weaknesses of Reinforcement Learning for Neural Machine Translation

DeepQAMVS: Query-Aware Hierarchical Pointer Networks for Multi-Video Summarization

Help, Anna! Visual Navigation with Natural Multimodal Assistance via Retrospective Curiosity-Encouraging Imitation Learning

Contact Info

Product

Resources

About