Abstract-We propose the task of free-form and open-ended Visual Question Answering (VQA). Given an image and a natural language question about the image, the task is to provide an accurate natural language answer. Mirroring real-world scenarios, such as helping the visually impaired, both the questions and answers are open-ended. Visual questions selectively target different areas of an image, including background details and underlying context. As a result, a system that succeeds at VQA typically needs a more detailed understanding of the image and complex reasoning than a system producing generic image captions. Moreover, VQA is amenable to automatic evaluation, since many open-ended answers contain only a few words or a closed set of answers that can be provided in a multiple-choice format. We provide a dataset containing ∼0.25M images, ∼0.76M questions, and ∼10M answers (www.visualqa.org), and discuss the information it provides. Numerous baselines and methods for VQA are provided and compared with human performance. Our VQA demo is available on CloudCV (http://cloudcv.org/vqa).
Attention-based neural encoder-decoder frameworks have been widely adopted for image captioning. Most methods force visual attention to be active for every generated word. However, the decoder likely requires little to no visual information from the image to predict nonvisual words such as "the" and "of". Other words that may seem visual can often be predicted reliably just from the language model e.g., "sign" after "behind a red stop" or "phone" following "talking on a cell". In this paper, we propose a novel adaptive attention model with a visual sentinel. At each time step, our model decides whether to attend to the image (and if so, to which regions) or to the visual sentinel. The model decides whether to attend to the image and where, in order to extract meaningful information for sequential word generation. We test our method on the COCO image captioning 2015 challenge dataset and Flickr30K. Our approach sets the new state-of-the-art by a significant margin. The source code can be downloaded from https
We propose a novel scene graph generation model called Graph R-CNN, that is both effective and efficient at detecting objects and their relations in images. Our model contains a Relation Proposal Network (RePN) that efficiently deals with the quadratic number of potential relations between objects in an image. We also propose an attentional Graph Convolutional Network (aGCN) that effectively captures contextual information between objects and relations. Finally, we introduce a new evaluation metric that is more holistic and realistic than existing metrics. We report state-of-the-art performance on scene graph generation as evaluated using both existing and our proposed metrics.
The gnathostome (jawed vertebrate) crown group comprises two extant clades with contrasting character complements. Notably, Chondrichthyes (cartilaginous fish) lack the large dermal bones that characterize Osteichthyes (bony fish and tetrapods). The polarities of these differences, and the morphology of the last common ancestor of crown gnathostomes, are the subject of continuing debate. Here we describe a three-dimensionally preserved 419-million-year-old placoderm fish from the Silurian of China that represents the first stem gnathostome with dermal marginal jaw bones (premaxilla, maxilla and dentary), features previously restricted to Osteichthyes. A phylogenetic analysis places the new form near the top of the gnathostome stem group but does not fully resolve its relationships to other placoderms. The analysis also assigns all acanthodians to the chondrichthyan stem group. These results suggest that the last common ancestor of Chondrichthyes and Osteichthyes had a macromeric dermal skeleton, and provide a new framework for studying crown gnathostome divergence.
We introduce a novel framework for image captioning that can produce natural language explicitly grounded in entities that object detectors find in the image. Our approach reconciles classical slot filling approaches (that are generally better grounded in images) with modern neural captioning approaches (that are generally more natural sounding and accurate). Our approach first generates a sentence 'template' with slot locations explicitly tied to specific image regions. These slots are then filled in by visual concepts identified in the regions by object detectors. The entire architecture (sentence template generation and slot filling with object detectors) is end-to-end differentiable. We verify the effectiveness of our proposed model on different image captioning tasks. On standard image captioning and novel object captioning, our model reaches state-of-the-art on both COCO and Flickr30k datasets. We also demonstrate that our model has unique advantages when the train and test distributions of scene compositions -and hence language priors of associated captions -are different. Code has been made available at: https://github.com/jiasenlu/NeuralBabyTalk. 1 frequently, directly reproduced from a caption in the training data.
Bacteria are thought to contribute to the pathogenesis of necrotizing enterocolitis (NEC), but it is unknown whether their interaction with the epithelium can participate in the initiation of mucosal injury or they can act only following translocation across a damaged intestinal barrier. Our aims were to determine whether bacteria and intestinal epithelial TLR4 play roles in a well-established neonatal rat model and a novel neonatal murine model of NEC. Neonatal rats, C57BL/6J, C3HeB/FeJ (TLR4 wild type), and C3H/HeJ (TLR4 mutant) mice were delivered by Cesarean section and were subjected to formula feeding and cold asphyxia stress or were delivered naturally and were mother-fed. NEC incidence was evaluated by histological scoring, and gene expression was quantified using quantitative real-time PCR from cDNA generated from intestinal total RNA or from RNA obtained by laser capture microdissection. Spontaneous feeding catheter colonization or supplementation of cultured bacterial isolates to formula increased the incidence of experimental NEC. During the first 72 h of life, i.e., the time frame of NEC development in this model, intestinal TLR4 mRNA gradually decreases in mother-fed but increases in formula feeding and cold asphyxia stress, correlating with induced inducible NO synthase. TLR4, inducible NO synthase, and inflammatory cytokine induction occurred in the intestinal epithelium but not in the submucosa. NEC incidence was diminished in C3H/HeJ mice, compared with C3HeB/FeJ mice. In summary, bacteria and TLR4 play significant roles in experimental NEC, likely via an interaction of intraluminal bacteria and aberrantly overexpressed TLR4 in enterocytes.
Azotobacter vinelandii is a soil bacterium related to the Pseudomonas genus that fixes nitrogen under aerobic conditions while simultaneously protecting nitrogenase from oxygen damage. In response to carbon availability, this organism undergoes a simple differentiation process to form cysts that are resistant to drought and other physical and chemical agents. Here we report the complete genome sequence of A. vinelandii DJ, which has a single circular genome of 5,365,318 bp. In order to reconcile an obligate aerobic lifestyle with exquisitely oxygen-sensitive processes, A. vinelandii is specialized in terms of its complement of respiratory proteins. It is able to produce alginate, a polymer that further protects the organism from excess exogenous oxygen, and it has multiple duplications of alginate modification genes, which may alter alginate composition in response to oxygen availability. The genome analysis identified the chromosomal locations of the genes coding for the three known oxygen-sensitive nitrogenases, as well as genes coding for other oxygen-sensitive enzymes, such as carbon monoxide dehydrogenase and formate dehydrogenase. These findings offer new prospects for the wider application of A. vinelandii as a host for the production and characterization of oxygen-sensitive proteins.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.