Videos express highly structured spatio-temporal patterns of visual data. A video can be thought of as being governed by two factors: (i) temporally invariant (e.g., person identity), or slowly varying (e.g., activity), attributeinduced appearance, encoding the persistent content of each frame, and (ii) an inter-frame motion or scene dynamics (e.g., encoding evolution of the person executing the action). Based on this intuition, we propose a generative framework for video generation and future prediction. The proposed framework generates a video (short clip) by decoding samples sequentially drawn from a latent space distribution into full video frames. Variational Autoencoders (VAEs) are used as a means of encoding/decoding frames into/from the latent space and RNN as a way to model the dynamics in the latent space. We improve the video generation consistency through temporally-conditional sampling and quality by structuring the latent space with attribute controls; ensuring that attributes can be both inferred and conditioned on during learning/generation. As a result, given attributes and/or the first frame, our model is able to generate diverse but highly consistent sets of video sequences, accounting for the inherent uncertainty in the prediction task. Experimental results on Chair CAD [1], Weizmann Human Action [2], and MIT Flickr [3] datasets, along with detailed comparison to the state-of-the-art, verify effectiveness of the framework.
We present a novel method to create planar visualizations of treelike structures (e.g., blood vessels and airway trees) where the shape of the object is well preserved, allowing for easy recognition by users familiar with the structures. Based on the extracted skeleton within the treelike object, a radial planar embedding is first obtained such that there are no self-intersections of the skeleton which would have resulted in occlusions in the final view. An optimization procedure which adjusts the angular positions of the skeleton nodes is then used to reconstruct the shape as closely as possible to the original, according to a specified view plane, which thus preserves the global geometric context of the object. Using this shape recovered embedded skeleton, the object surface is then flattened to the plane without occlusions using harmonic mapping. The boundary of the mesh is adjusted during the flattening step to account for regions where the mesh is stretched over concavities. This parameterized surface can then be used either as a map for guidance during endoluminal navigation or directly for interrogation and decision making. Depth cues are provided with a grayscale border to aid in shape understanding. Examples are presented using bronchial trees, cranial and lower limb blood vessels, and upper aorta datasets, and the results are evaluated quantitatively and with a user study.
We propose an approach for improving sequence modeling based on autoregressive normalizing flows. Each autoregressive transform, acting across time, serves as a moving frame of reference, removing temporal correlations and simplifying the modeling of higher-level dynamics. This technique provides a simple, generalpurpose method for improving sequence modeling, with connections to existing and classical techniques. We demonstrate the proposed approach both with standalone flow-based models and as a component within sequential latent variable models. Results are presented on three benchmark video datasets, where autoregressive flow-based dynamics improve log-likelihood performance over baseline models. Finally, we illustrate the decorrelation and improved generalization properties of using flow-based dynamics.
Virtual colonoscopy (VC) allows a physician to virtually navigate within a reconstructed 3D colon model searching for colorectal polyps. Though VC is widely recognized as a highly sensitive and specific test for identifying polyps, one limitation is the reading time, which can take over 30 minutes per patient. Large amounts of the colon are often devoid of polyps, and a way of identifying these polyp-free segments could be of valuable use in reducing the required reading time for the interrogating radiologist. To this end, we have tested the ability of the collective crowd intelligence of non-expert workers to identify polyp candidates and polyp-free regions. We presented twenty short videos flying through a segment of a virtual colon to each worker, and the crowd was asked to determine whether or not a possible polyp was observed within that video segment. We evaluated our framework on Amazon Mechanical Turk and found that the crowd was able to achieve a sensitivity of 80.0% and specificity of 86.5% in identifying video segments which contained a clinically proven polyp. Since each polyp appeared in multiple consecutive segments, all polyps were in fact identified. Using the crowd results as a first pass, 80% of the video segments could in theory be skipped by the radiologist, equating to a significant time savings and enabling more VC examinations to be performed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.