Merge Frame Design for Video Stream Switching Using Piecewise Constant Functions

Dai, Wei; Cheung, Gene; Cheung, Ngai-Man; Ortega, Antonio; Au, Oscar C.

doi:10.1109/tip.2016.2571564

Cited by 11 publications

(13 citation statements)

References 37 publications

(64 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We next iteratively add/remove inter-coded MDUs using a fast branch-and-bound (B&B) method [18] to refine the MDU structure. Taking interactive LF images and viewport adaptive 360 • images as illustrative applications, and using I-, P-and previously proposed merge (M-) frames [19] to encode MDUs, we show experimentally that landmark-based MDU structures can noticeably reduce 2 By coding cost, we mean the coding bitrate, not coding complexity.…”

Section: Introductionmentioning

confidence: 94%

“…Two recent studies [7] [9] focused on the use of distributed source coding (DSC) frames [26] and merge frames [19] for view-switching without coding drift. [8] discussed the general notion of landmark to be used in ILFS but did not provide a formal optimization framework.…”

Section: Related Work a Interactive Light Field Streaming (Ilfs)mentioning

confidence: 99%

“…We encode one viewport-worth of data as one MDU, thus achieving good coding efficiency due to intra-prediction. At the same time, we achieve low view-switching cost thanks to the inter-coding of neighboring MDUs using advanced DSC tools [19] and landmarks.…”

Section: • Video Streamingmentioning

confidence: 99%

“…To accomplish this, a merge-coded MDU (M-MDU) M j is employed to "merge" different reconstructions of P j (i) identically to I-MDU I j . Many previous works have studied the merge operator, including SP-frame in H.264 [28], distributed source coding (DSC) in [26], and the merge frame in [19]. We discuss detailed implementation of our M-MDU based on [19] in Section VII-A.…”

Section: Representation Types In Coding Structurementioning

confidence: 99%

See 3 more Smart Citations

Landmarking for Navigational Streaming of Stored High-Dimensional Media

Yuan¹,

Cheung²,

Frossard³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Modern media data such as 360 • videos and light field (LF) images are typically captured in much higher dimensions than the observers' visual displays. To efficiently browse high-dimensional media over bandwidth-constrained networks, a navigational streaming model is considered: a client navigates the large media space by dictating a navigation path to a server, who in response transmits the corresponding pre-encoded media data units (MDU) to the client one-by-one in sequence. Intracoding an MDU (I-MDU) would result in a large bitrate but I-MDU can be randomly accessed, while inter-coding an MDU (P-MDU) using another MDU as a predictor incurs a small coding cost but imposes an order where the predictor must be first transmitted and decoded. From a compression perspective, the technical challenge is: how to achieve coding gain via inter-coding of MDUs, while enabling adequate random access for satisfactory user navigation.To address this problem, we propose a landmark-based MDU optimization framework with redundant representation-each MDU can be coded both as I-MDU and one or more P-MDUs. The media space is divided into neighborhoods, each containing one landmark (a chosen MDU). MDUs in a neighborhood use the associated landmark as a predictor for inter-coding. Thus, for the transition from one MDU to another in the same neighborhood, it requires only one P-MDU transmission when the landmark is already in the decoder buffer, enabling navigational random access. To optimize an MDU structure, we employ tree-structured vector quantizer (TSVQ) to first optimize landmark locations, then iteratively add/remove P-MDUs as refinements using a fast branch-and-bound technique. Taking interactive LF images and viewport adaptive 360 • images as illustrative applications, and I-, P-and previously proposed merge frames to intra-and inter-code MDUs, we show experimentally that landmarked MDU structures can noticeably reduce the expected transmission cost compared with MDU structures without landmarks.

show abstract

Section: Introductionmentioning

confidence: 94%

Section: Related Work a Interactive Light Field Streaming (Ilfs)mentioning

confidence: 99%

Section: • Video Streamingmentioning

confidence: 99%

Section: Representation Types In Coding Structurementioning

confidence: 99%

See 2 more Smart Citations

Landmarking for Navigational Streaming of Stored High-Dimensional Media

Yuan¹,

Cheung²,

Frossard³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…A receiving user can periodically request switches to neighboring camera views, and the server in response switches video streams with minimum discruption to the user's viewing experience. To facilitate stream-switching, new frames like DSC frame [9] and merge frame (M-frame) [10] were proposed. Unlike IMVS [4][5][6][7][8], we optimize the division of 360 VR video into multiple streams covering different view ranges given a constant RTT.…”

Section: Related Workmentioning

confidence: 99%

Multi-stream switching for interactive virtual reality video streaming

Cheung

Liu

Ma³

et al. 2017

2017 IEEE International Conference on Image Processing (ICIP)

Self Cite

View full text Add to dashboard Cite

Virtual reality (VR) video provides an immersive 360 viewing experience to a user wearing a headmounted display: as the user rotates his head, correspondingly different fields-of-view (FoV) of the 360 video are rendered for observation. Transmitting the entire 360 video in high quality over bandwidth-constrained networks from server to client for real-time playback is challenging. In this paper we propose a multi-stream switching framework for VR video streaming: the server pre-encodes a set of VR video streams covering different view ranges that account for server-client round trip time (RTT) delay, and during streaming the server transmits and switches streams according to a user's detected head rotation angle. For a given RTT, we formulate an optimization to seek multiple VR streams of different view ranges and the head-angle-to-stream mapping function simultaneously, in order to minimize the expected distortion subject to bandwidth and storage constraints. We propose an alternating algorithm that, at each iteration, computes the optimal streams while keeping the mapping function fixed and vice versa. Experiments show that for the same bandwidth, our multi-stream switching scheme outperforms a non-switching single-stream approach by up to 2.9dB in PSNR.

show abstract