We present a novel method for addressing the semantic description of images. Our method offers two main contributions. First we introduce a recurrent unit that we call a simplified long short-term memory (LSTM) unit which, in contrast to traditional LSTM units, couples the functions of the input and forget gates into a single gate; we observed that this simpler unit improves accuracy. We also propose a novel algorithm for exploring the search space of sentences inferred through a joined Convolutional Neural Network (CNN) and our simplified LSTM unit. We explore the search space by reducing it to a search over sequential trees for the combination of sequences that best represent the image to be described. Our results show improvement over the state of the art methods on the COCO [1] and Flickr8K [2] datasets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.