The forthcoming MPEG-4 standard specifies in its systems part an audiovisual scene description and functionality for the elementary stream management. The elementary streammanagement functionality is introduced here. It consists of a media object description framework that describes the streaming resources that form part of an MPEG-4 presentation and of a synchronization syntax incorporated in a flexible sync layer with an underlying systems decoder model. The final section outlines the transport and session setup for MPEG-4 presentations on relevant transport media, namely, the Internet and in digital broadcast scenarios. I. INTRODUCTION I SO/IEC JTC1 SC29 WG11, better known as MPEG (Moving Pictures Experts Group), is currently in the process of finalizing the next MPEG standard, called MPEG-4, which will formally be labeled ISO/IEC IS 14496. The MPEG-4 standardization process reached final committee draft level in May 1998. This is a major milestone and therefore a good opportunity to highlight some major elements and properties of this forthcoming standard. The major design goal of MPEG-4 is to define a framework for the joint description, compression, storage, and transmission of natural and synthetic audiovisual data. Similar to previous MPEG standards, MPEG-4 defines improved compression algorithms for audio and video signals, including specialized tools, e.g., for speech coding. In addition, means to efficiently describe synthetic audio and graphics exist. Again, specialized tools, e.g., for the animation of synthetic faces, are included. The data streams that correspond to such timevariant signals are transmitted or stored separately and are only composed into an integrated audiovisual presentation at the receiver. The format for the description of how the individual elements of a presentation are related in space and time is also defined by MPEG-4. This description is object oriented and identifies each semantically meaningful audiovisual entity as a media object. The MPEG-4 standard comes in several parts. Compression techniques for visual and audible media objects are specified in parts 2 [2] and 3 [3] of MPEG-4, respectively. Part 1 [1] has itself two major topics: the definition of the binary scene description, which has its roots in VRML97 [4], and the management of the elementary streams that convey the coded representation of audiovisual data. Other parts of MPEG-4 contain reference software (part 5
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.