Listening tests were conducted to evaluate the feasibility of a novel 2D-to-3D ambience upmixing technique named "perceptual band allocation" (PBA). Four-channel ambience signals captured in a reverberant concert hall were low-pass and high-pass filtered, which were then routed to lower and upper loudspeaker layers arranged in a 9-channel 3D configuration, respectively. The upmixed stimuli were compared against original 3D recordings made using an 8-channel ambience microphone array in terms of 3D listener envelopment and preference. The results suggest that the perceived quality of the proposed method could be at least comparable to that of an original 3D recording.
INTRODUCTIONThree-dimensional multichannel audio systems such as Auro-3D [1], Dolby Atmos [2], and 22.2 [3] employ additional height loudspeakers in order to provide the listener with a three-dimensional (3D) auditory experience. One of the perceptual attributes that could be enhanced by the use of height channels is listener envelopment (LEV). In the context of two-dimensional (2D) surround sound (e.g., 5.1), LEV is widely understood as the subjective impression of being enveloped by reverberant sound [4,5]. With 3D loudspeaker formats, the added height channels could be used to render the "vertical" spread of reverberant sound image as well as the horizontal one, and ultimately the auditory impression of 3D LEV could be achieved.One of the key requirements for 3D multichannel audio applications would be a 2D-to-3D upmixing technique that can add a height dimension to 2D content. Therefore, a new method that can render vertical image spread would be necessary. In the context of horizontal stereophony, horizontal image spread can be rendered by means of interchannel decorrelation, and many different decorrelation methods have been proposed over the past years [6][7][8][9][10]. Such methods are based on the principle that as the degree of correlation between stereophonic channel signals decreases, that between ear-input signals (interaural crosscorrelation), which has a direct relationship with perceived auditory image spread [4], also decreases. However, vertically reproduced stereophonic signals would have no or little influence on interaural cross-correlation. From a recent study by Gribben and Lee [11] it was found that vertically applied interchannel decorrelation was not as effective as horizontal decorrelation in terms of controlling the spread of image.The literature generally suggests that vertical localization relies on spectral cues. A number of researchers [12][13][14] have found that the higher the frequency of a pure tone the higher the perceived image position was regardless of the physical height of the presenting loudspeaker; a phenomenon referred to as the "pitch-height" effect in [15]. In the case of band-pass filtered noise signals, however, this effect was reported to be dependent on the physical height of the loudspeaker that presents the signal. For example, Roffler and Butler [16] found from their experiments using loudspeake...