Image paragraph generation aims to describe an image with a paragraph in natural language. Compared to image captioning with a single sentence, paragraph generation provides more expressive and fine-grained description for storytelling. Existing approaches mainly optimize paragraph generator towards minimizing
word-wise cross entropy loss, which neglects linguistic hierarchy of paragraph and results in ``sparse" supervision for generator learning. In this paper, we propose a novel Densely Supervised Hierarchical Policy-Value (DHPV) network for effective paragraph generation. We design new hierarchical supervisions consisting of hierarchical rewards and values at both sentence and word levels. The joint exploration of hierarchical rewards and values provides dense supervision cues for learning effective paragraph generator. We propose a new hierarchical policy-value architecture which exploits compositionality at token-to-token and sentence-to-sentence levels simultaneously and can preserve the semantic and syntactic constituent integrity. Extensive experiments on the Stanford image-paragraph benchmark have demonstrated the effectiveness of the proposed DHPV approach with performance improvements over multiple state-of-the-art methods.
Abstract-Scalable video streaming over femtocell networks relying on two-tier spectrum-sharing is designed for coping with time-varying channel conditions, stringent video QoS requirements as well as with strong cross-tier interference between the over-sailing macro-and the femtocells. Dynamic video layer selection and resource allocation are invoked to enable the adaptation of the scalable video streaming service to the dynamics of both channel quality and interference price fluctuations. We formulate the design as a constrained stochastic optimization problem, which strikes a compelling compromise between the perceivable quality of experience and the monetary implications of the interference. Since the time scale of resource allocation is more short-term than that of the video layer selection, we decompose the original long-term utility optimization problem into a pair of readily tractable subproblems with the aid of two different time-scales by invoking the powerful technique of Lyapunov drift and optimization. By exploiting the specific structure of these subproblems, low-complexity algorithms are derived for dynamic video layer selection and resource allocation, which rely on the near-instantaneously available information rather than on any prior statistical knowledge. Finally, we derive the analytical bounds of the theoretically achievable performance. Experimental results are presented for characterizing the performance attained.
Three-dimensional information perception from point clouds is of vital importance for improving the ability of machines to understand the world, especially for autonomous driving and unmanned aerial vehicles. Data annotation for point clouds is one of the most challenging and costly tasks. In this paper, we propose a closed-loop and virtual–real interactive point cloud generation and model-upgrading framework called Parallel Point Clouds (PPCs). To our best knowledge, this is the first time that the training model has been changed from an open-loop to a closed-loop mechanism. The feedback from the evaluation results is used to update the training dataset, benefiting from the flexibility of artificial scenes. Under the framework, a point-based LiDAR simulation model is proposed, which greatly simplifies the scanning operation. Besides, a group-based placing method is put forward to integrate hybrid point clouds, via locating candidate positions for virtual objects in real scenes. Taking advantage of the CAD models and mobile LiDAR devices, two hybrid point cloud datasets, i.e., ShapeKITTI and MobilePointClouds, are built for 3D detection tasks. With almost zero labor cost on data annotation for newly added objects, the models (PointPillars) trained with ShapeKITTI and MobilePointClouds achieved 78.6% and 60.0% of the average precision of the model trained with real data on 3D detection, respectively.
Minimum changes bandwidth allocation (MCBA) is an optimal video transmission scheme resulting in the minimum number of rate changes with the minimum peak rate, which can reduce renegotiation frequency in statistical multiplexing service. The existing MCBA algorithm with searching frontiers, however, has a high computational complexity, which depressed the applicability of MCBA. In this paper, we design a fast algorithm of MCBA that examines for the bounded point based on the convex envelopes of the overflow and underflow curve. As a result, our algorithm can work out an MCBA scheme in a shorter time because we can find the bounded point with a linear complexity, and calculate each transmission rate within several iterations. The simulation using real video traces confirmed the philosophy and efficiency of our algorithm.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.