Image paragraph generation aims to describe an image with a paragraph in natural language. Compared to image captioning with a single sentence, paragraph generation provides more expressive and fine-grained description for storytelling. Existing approaches mainly optimize paragraph generator towards minimizing
word-wise cross entropy loss, which neglects linguistic hierarchy of paragraph and results in ``sparse" supervision for generator learning. In this paper, we propose a novel Densely Supervised Hierarchical Policy-Value (DHPV) network for effective paragraph generation. We design new hierarchical supervisions consisting of hierarchical rewards and values at both sentence and word levels. The joint exploration of hierarchical rewards and values provides dense supervision cues for learning effective paragraph generator. We propose a new hierarchical policy-value architecture which exploits compositionality at token-to-token and sentence-to-sentence levels simultaneously and can preserve the semantic and syntactic constituent integrity. Extensive experiments on the Stanford image-paragraph benchmark have demonstrated the effectiveness of the proposed DHPV approach with performance improvements over multiple state-of-the-art methods.
Abstract-Scalable video streaming over femtocell networks relying on two-tier spectrum-sharing is designed for coping with time-varying channel conditions, stringent video QoS requirements as well as with strong cross-tier interference between the over-sailing macro-and the femtocells. Dynamic video layer selection and resource allocation are invoked to enable the adaptation of the scalable video streaming service to the dynamics of both channel quality and interference price fluctuations. We formulate the design as a constrained stochastic optimization problem, which strikes a compelling compromise between the perceivable quality of experience and the monetary implications of the interference. Since the time scale of resource allocation is more short-term than that of the video layer selection, we decompose the original long-term utility optimization problem into a pair of readily tractable subproblems with the aid of two different time-scales by invoking the powerful technique of Lyapunov drift and optimization. By exploiting the specific structure of these subproblems, low-complexity algorithms are derived for dynamic video layer selection and resource allocation, which rely on the near-instantaneously available information rather than on any prior statistical knowledge. Finally, we derive the analytical bounds of the theoretically achievable performance. Experimental results are presented for characterizing the performance attained.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.