Speaker extraction aims to extract the target speech signal from a multi-talker environment given a target speaker's reference speech. We recently proposed a time-domain solution, SpEx, that avoids the phase estimation in frequency-domain approaches. Unfortunately, SpEx is not fully a time-domain solution since it performs time-domain speech encoding for speaker extraction, while taking frequency-domain speaker embedding as the reference. The size of the analysis window for timedomain and the size for frequency-domain input are also different. Such mismatch has an adverse effect on the system performance. To eliminate such mismatch, we propose a complete time-domain speaker extraction solution, that is called SpEx+. Specifically, we tie the weights of two identical speech encoder networks, one for the encoder-extractor-decoder pipeline, another as part of the speaker encoder. Experiments show that the SpEx+ achieves 0.8dB and 2.1dB SDR improvement over the state-of-the-art SpEx baseline, under different and same gender conditions on WSJ0-2mix-extr database respectively.
The paper proposes a concurrent constant modulus algorithm (CMA) and soft decision-directed (SDD) scheme for lowcomplexity blind equalization of high-order quadrature amplitude modulation channels. Simulation using a fractionally-spaced equalization setting is used to compare the proposed scheme with the recently introduced state-of-art concurrent CMA and decisiondirected (DD) scheme. The proposed CMA+SDD blind equalizer is shown to have simpler computational complexity per weight update, faster convergence speed, and slightly improved steady-state equalization performance, compared with the CMA+DD blind equalizer.
Decision feedback in a decision feedback equaliser (DFE) performs a space translation that maps the DFE onto a transversal equaliser in the translated observation space. Properties of DFEs can therefore be analysed more easily by exploiting this geometric translation property. This approach is used to analyse the conventional DFE that employs a linear combination of the channel observations and the past decisions (the linear-combiner DFE). It is demonstrated that the usual minimum mean square error (MMSE) solution does not achieve the full performance potential of the linearcombiner DFE structure. A bit error rate (BER) expression for the linear-combiner DFE with binary signalling is obtained, and a method is proposed to optimally set the coefficients of the linear-combiner DFE. The performance of this minimum-BER (MBER) linear-combiner DFE is much closer to that of the optimal Bayesian DFE, compared with the MMSE linear-combiner DFE.
While most current approaches for sports video analysis are based on broadcast video, in this paper, we present a novel approach for highlight detection and automatic replay generation for soccer videos taken by the main camera. This research is important as current soccer highlight detection and replay generation from a live game is a labor-intensive process. A robust multi-level, multi-model event detection framework is proposed to detect the event and event boundaries from the video taken by the main camera. This framework explores the possible analysis cues, using a mid-level representation to bridge the gap between low-level features and high-level events. The event detection results and midlevel representation are used to generate replays which are automatically inserted into the video. Experimental results are promising and found to be comparable with those generated by broadcast professionals.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.