We propose an end-to-end trainable approach to singlechannel speech separation with unknown number of speakers. Our approach extends the MulCat source separation backbone with additional output heads: a count-head to infer the number of speakers, and decoder-heads for reconstructing the original signals. Beyond the model, we also propose a metric on how to evaluate source separation with variable number of speakers. Specifically, we clear up the issue on how to evaluate the quality when the ground-truth has more or less speakers than the ones predicted by the model. We evaluate our approach on the WSJ0-mix datasets, with mixtures up to five speakers. We demonstrate that our approach outperforms state-of-the-art in counting the number of speakers and remains competitive in quality of reconstructed signals.
Stream processor has been widely used in multimedia processing because of the high performance gained by parallelism. In order to achieve higher parallelism, the stream processor employs large width structure of VLIW (very long instruction word, VLIW) and multiple parallelizable instructions are organized into one VLIW. Because the width of VLIW is fixed, there are a large number of empty operations (non-operation, NOP) filled in VLIW, which results in serious code size expansion problem. Aiming at this issue, the horizontal code compression and vertical code compression methods are applied on the VLIW of stream processor respectively. First the VLIW is divided into several subfields according to the logic characteristics of VLIW instruction, then the horizontal code compression scheme which based on Huffman coding is applied on each subfield and this method can achieve approximately 78% code size reduction on average. However, the extra-long time required to decode the compressed VLIW before instruction execution may cause system performance penalty. In order to reduce the decompression time consumption, the vertical compression scheme is proposed. The vertical compression can reduce the code size nearly 70% by deleting the NOPs of VLIW in vertical direction. Furthermore the VLIW after vertical compression can be executed directly without decompression operation by using banked instruction memory. Specifically, the vertical compression can compress stream processor VLIW code size significantly and without any negative influence on performance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.