This paper proposes a network architecture to perform variable length semantic video generation using captions. We adopt a new perspective towards video generation where we allow the captions to be combined with the long-term and short-term dependencies between video frames and thus generate a video in an incremental manner. Our experiments demonstrate our network architecture's ability to distinguish between objects, actions and interactions in a video and combine them to generate videos for unseen captions. The network also exhibits the capability to perform spatio-temporal style transfer when asked to generate videos for a sequence of captions. We also show that the network's ability to learn a latent representation allows it generate videos in an unsupervised manner and perform other tasks such as action recognition. 1
Social media is currently one of the most important means of news communication. Since people are consuming a large fraction of their daily news through social media, all the traditional news channels are using social media to catch the attention of users. Each news channel has its own strategy to attract more users. In this paper, we analyze how the news channels use sentiment to garner users' attention in social media. We compare the sentiment of news posts generated by television, radio and print media, to show the di erences in the news covered by these channels. We also analyze users' reactions and sentiment of users' opinions on news posts with di erent sentiments. We do our analysis on the dataset extracted from the Facebook Pages of ve popular news channels. Our dataset contains 0.15 million news posts and 1.13 billion users reactions. Our result shows that sentiment of the user opinion strongly correlates with the sentiment of news posts and the type of information source. Our study also illustrates the di erences between the social media news channels of di erent types of news sources.
is paper introduces a novel approach for generating videos called Synchronized Deep Recurrent A entive Writer (Sync-DRAW). Sync-DRAW can also perform text-to-video generation which, to the best of our knowledge, makes it the rst approach of its kind. It combines a Variational Autoencoder (VAE) with a Recurrent A ention Mechanism in a novel manner to create a temporally dependent sequence of frames that are gradually formed over time. e recurrent a ention mechanism in Sync-DRAW a ends to each individual frame of the video in sychronization, while the VAE learns a latent distribution for the entire video at the global level. Our experiments with Bouncing MNIST, KTH and UCF-101 suggest that Sync-DRAW is e cient in learning the spatial and temporal information of the videos and generates frames with high structural integrity, and can generate videos from simple captions on these datasets. 1
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.