We present the Moments in Time Dataset, a large-scale human-annotated collection of one million short videos corresponding to dynamic events unfolding within three seconds. Modeling the spatial-audio-temporal dynamics even for actions occurring in 3 second videos poses many challenges: meaningful events do not include only people, but also objects, animals, and natural phenomena; visual and auditory events can be symmetrical in time ("opening" is "closing" in reverse), and either transient or sustained. We describe the annotation process of our dataset (each video is tagged with one action or activity label among 339 different classes), analyze its scale and diversity in comparison to other large-scale video datasets for action recognition, and report results of several baseline models addressing separately, and jointly, three modalities: spatial, temporal and auditory. The Moments in Time dataset, designed to have a large coverage and diversity of events in both visual and auditory modalities, can serve as a new challenge to develop models that scale to the level of complexity and abstract reasoning that a human processes on a daily basis.
Preface An important field of current research in digital image processing is the representation of image shape features for various applications in recognition, pose recovery and image coding. Several types of image moment functions have been reported in literature as efficient shape descriptors, each having its own relative merits. This book addresses the theoretical and application oriented aspects of image moments, and is intended to be a reference material for those interested in the areas of computer vision and image analysis. The first part of this book discusses the mathematical properties of geometric, complex and orthogonal moments, while the second part provides an overview of the applications of moments in pattern recognition, object identification, and object pose estimation. A comprehensive list of references is given in bibliography, to help the readers pursue the subject matter in greater depth. The work presented in this book is expected to aid and stimulate further research in the area of moment based feature representation.
In this paper, we explore the convergence of the caching and streaming technologies for Internet multimedia. The paper describes a design for a streaming and caching architecture to be deployed on broadband networks. The basis of the work is the proposed Internet standard, Real Time Streaming Protocol (RTSP), likely to be the de-facto standard for web-based A/V caching and streaming, in the near future. The proxies are all managed by an 'intelligent Agent' or 'Broker' -this has been designed as an enhanced RTSP proxy server that maintains the state information that is so essential in streaming of media data. In addition, all the caching algorithms run on the broker. Having an intelligent agent or broker ensures that the 'simple' caching servers can be easily embedded into the network. However, RTSP does not have the right model for doing broker based streaming/caching architecture. The work reported here is an attempt to contribute towards that end.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.