From “Before” to “After”: Generating Natural Language Instructions from Image Pairs in a Simple Visual Domain

Rojowiec, Robin; Götze, Jana; Sadler, Philipp; Voigt, Henrik; Zarrieß, Sina; Schlangen, David

doi:10.18653/v1/2020.inlg-1.38

Cited by 1 publication

(2 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We present a transformer-based generation model with a simple but novel difference attention head designed to visually ground complex locative expressions and target-landmark references in image pairs. We show that our model clearly exceeds the performance of Rojowiec et al (2020)'s existing baseline models on this task, in greatly improving the accuracy of generated target and landmark references. In contrast to other recent instruction generation models (Fried et al, 2017;Köhn et al, 2020;Schumann and Riezler, 2021), our approach does not use any symbolic representations of scene states and trajectories.…”

Section: Introductionmentioning

confidence: 77%

“…For landmarks, there might be several blocks mentioned by different crowd-workers. Since the blocks are generally referred to their logos, the targets in BLOCKS can be detected in human and generated captions with a simple, rule-based instruction parser (Rojowiec et al, 2020). In Spot-the-diff, there might be several target objects referred to by a more complex vocabulary, e.g.…”

Section: Training and Hyperparametersmentioning

confidence: 99%

See 1 more Smart Citation

Proceedings of the 15th International Conference on Natural Language Generation

2022

View full text Add to dashboard Cite

Bio: Margaret Mitchell is a researcher working on Ethical AI, currently focused on the ins and outs of ethics-informed AI development in tech. She has published over 50 papers on natural language generation, assistive technology, computer vision, and AI ethics, and holds multiple patents in the areas of conversation generation and sentiment classification. She currently works at Hugging Face driving forward work in the ML development ecosystem, ML data governance, AI evaluation, and AI ethics. She previously worked at Google AI as a Staff Research Scientist, where she founded and co-led Google's Ethical AI group, focused on foundational AI ethics research and operationalizing AI ethics Googleinternally. Before joining Google, she was a researcher at Microsoft Research, focused on computer vision-to-language generation; and was a postdoc at Johns Hopkins, focused on Bayesian modeling and information extraction. She holds a PhD in Computer Science from the University of Aberdeen and a Master's in computational linguistics from the University of Washington. While earning her degrees, she also worked from 2005-2012 on machine learning, neurological disorders, and assistive technology at Oregon Health and Science University. She has spearheaded a number of workshops and initiatives at the intersections of diversity, inclusion, computer science, and ethics. Her work has received awards from Secretary of Defense Ash Carter and the American Foundation for the Blind, and has been implemented by multiple technology companies. She likes gardening, dogs, and cats.

show abstract

Section: Introductionmentioning

confidence: 77%

Section: Training and Hyperparametersmentioning

confidence: 99%

Proceedings of the 15th International Conference on Natural Language Generation

2022

View full text Add to dashboard Cite

show abstract

From “Before” to “After”: Generating Natural Language Instructions from Image Pairs in a Simple Visual Domain

Cited by 1 publication

References 22 publications

Proceedings of the 15th International Conference on Natural Language Generation

Proceedings of the 15th International Conference on Natural Language Generation

Contact Info

Product

Resources

About