One of the ways blind people understand their surroundings is by clicking images and relying on descriptions generated by image captioning systems. Current work on captioning images for the visually impaired do not use the textual data present in the image when generating captions. This problem is critical as many visual scenes contain text. Moreover, up to 21% of the questions asked by blind people about the images they click pertain to the text present in them (Bigham et al., 2010). In this work, we propose altering AoANet, a state-of-the-art image captioning model, to leverage the text detected in the image as an input feature. In addition, we use a pointer-generator mechanism to copy the detected text to the caption when tokens need to be reproduced accurately. Our model outperforms AoANet on the benchmark dataset VizWiz, giving a 35% and 16.2% performance improvement on CIDEr and SPICE scores, respectively..
One of the ways blind people understand their surroundings is by clicking images and relying on descriptions generated by image captioning systems. Current work on captioning images for the visually impaired do not use the textual data present in the image when generating captions. This problem is critical as many visual scenes contain text. Moreover, up to 21% of the questions asked by blind people about the images they click pertain to the text present in them (Bigham et al., 2010). In this work, we propose altering AoANet, a state-of-the-art image captioning model, to leverage the text detected in the image as an input feature. In addition, we use a pointer-generator mechanism to copy the detected text to the caption when tokens need to be reproduced accurately. Our model outperforms AoANet on the benchmark dataset VizWiz, giving a 35% and 16.2% performance improvement on CIDEr and SPICE scores, respectively..
The proliferation of social media platforms, recommender systems, and their joint societal impacts have prompted significant interest in opinion formation and evolution within social networks. In this work, we study how local dynamics in a network can drive opinion polarization. In particular, we study time evolving networks under the classic Friedkin-Johnsen opinion model. Edges are iteratively added or deleted according to simple local rules, modeling decisions based on individual preferences and network recommendations. 1 We give theoretical bounds showing how individual edge updates affect polarization, and a related measure of disagreement across edges. Via simulations on synthetic and real-world graphs, we find that the presence of two simple dynamics gives rise to high polarization: 1) confirmation bias -i.e., the preference for nodes to connect to other nodes with similar expressed opinions and 2) friend-of-friend link recommendations, which encourage new connections between closely connected nodes. We also investigate the role of fixed connections which are not subject to these dynamics. We find that even a small number of fixed edges can significantly limit polarization, but still lead to multimodal opinion distributions, which may be considered polarized in a different sense.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.