In this paper we present the Split-Merge Displacement Estimation algorithm as a bridge between the standard, block matching, displacement estimation technique and the semantic motion estimation techniques. We review this new technique and its applications in video compression and show how it can be used to maintain two different segmentation of the current visual field: a spatial segmentation identifying the objects that are present in the scene and a classification of the visual field into moving areas, that keeps track over time of the relationships between the objects.
MOTION ESTIMATIONBlock matching displacement estimation algorithms divide the image into a number of rcctangular blocks and compute a displacement vector for cach block by correlating the block with a search area in the previous frame: if the blocks are small enough, rotation, zooming, etc. of larger objects can be closely approxihated by a translation of the blocks themselves. The goal is to approximate interframe motion by piecewise translation of one or more areas of a frame relative to a reference frame. This technique was introduced by Jain and Jain in [I]. Jain and Jain's algorithm and its assumptions have been a guideline for more recent work in the field. Most of this recent work, as for example 121, [3], [4] and [5], has investigated the search strategy to be used in the search area.Block matching displacement estimation is simple: it does not require any semantic knowledge of the frames but reduces the motion estimation problem to a matching problem. In fact a semantic analysis of the frames, that identifies and "understands" the objects in each frame and their relationship from frame to frame, is generally a difficult task and it is not often practical today for video coding purposes. The lack of knowledge of the spatial relationship between pixels is also a limitation of the block matching technique: the prediction frames obtained via thc displacement vectors do not always maintain the original intraframe correlation: each block is supposed to be undergoing an independent translation; moreover the optimal displacement vector for a block is not always unique and it might be influenced by noise. Because of this, the reconstructed frame might loose a large part of the original spatial correlation between adjacent blocks. Today video standards (MPEG. Px64 etc.) are based on the block matching approach. Semantic, model-based techniques subsume the existence of a source model consisting in iobjects? described by shape parameters, motion parameters and colour parameters. The video coding algorithm have, in general, two distinct steps. The tirst step in which the image is analyzed and the objects, togethcr with their motion and colors, are identified, and a second step in which the parameters are coded. (see [6], [7], [8]) As an example, consider the model-based coding of faces. Model-based image coding of faces has been proposed as a way of achieving quality at reduced bitrates in applications such as video telephony. However most existin...