We propose a method to measure the capture-to-display delay (CDD)
I . I N T R O D U C T I O NEnd-to-end delay is an important concern for two-way visual communication applications. Video codec manufacturers need to measure the end-to-end delay of a video codec system in order to develop low-delay video codecs. Network service providers need to make sure the end-toend delay of a visual communication application is within the application requirement. A simple and general tool for measuring the end-to-end delay of a visual communication system is invaluable for applications related to two-way visual communications. Figure 1 shows an example of a mobile video chat system. Video captured by the camera is compressed by a video encoder. The encoder usually contains an encoder buffer to smooth the video bit-rate as described in [1]. The video bitstream is then packetized and transmitted over the network. At the decoder side, the video is decoded and displayed. The decoder usually contains a decoder buffer to smooth out the network jitter and to buffer the bit-stream before the video decoding. The encoder and decoder buffers can result in a relatively long delay. The end-to-end delay in this example, is the latency from frame capturing at the encoder side to the frame display at the decoder side, which we call capture-to-display delay (CDD), including the whole chain of video encoding, encoder buffering, packetization, The traditional way to measure the latency is by using timestamps. A timestamp is a code representing the global time. It can be generated by a counter driven by a network clock commonly available to both the encoder and the decoder. To measure the time delay between two points A and B, a timestamp is inserted at point A, retrieved at the point B, and compared with the global time at point B. For example, for the visual communication system shown in Fig. 1, to measure the end-to-end delay of the network part, timestamps are generated from the network clock and inserted at the network interface point in the encoder side. These timestamps are retrieved at the network interface point at the decoder side to compare with the global time. Similarly, to measure the CDD, we can insert timestamps at the video capture point, and observe the timestamps relative to the global time at the display point. As long as a network clock is available and the encoder clock and the decoder clock are synchronized, the delay can be calculated. However, in order to do this, we need to be able to modify the hardware or software to insert the timestamps, and retrieve the timestamps at the desired points. In many situations including our application scenario, cellphone video codecs are implemented in hardware and software by the developers. Thus, we cannot modify the video encoder and decoder to insert or retrieve the timestamps. Also, usually the encoder clock and the decoder clock are not synchronized. These make the measuring of the CDD particularly challenging.Boyaci et al.[2] presented a tool to measure the CDD of a video chat...