With significant advances in vision and natural language processing, the generation of image captions becomes a need. Mathews, Xie, and He extended a new model to generate styled captions by separating semantics and style. In continuation of their work, here, a new captioning model is developed, including an image encoder to extract the features, a mixture of recurrent networks to embed the set of extracted features to a group of words, and a sentence generator that combines the obtained words as a stylized sentence. This Mixture of Recurrent Experts (MoRE) system uses a new training algorithm that derives singular value decomposition from weighting matrices of Recurrent Neural Networks (RNNs) to increase the diversity of captions. Each decomposition step depends on a distinctive factor based on the number of RNNs in MoRE. The used sentence generator gives a stylized language corpus without paired images. Besides, the styled and diverse captions are extracted without training on a densely labeled or styled dataset. MoRE on the COCO dataset generated diverse and stylized image captions without the necessity of extra‐labeling and improved descriptions in terms of content accuracy.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.