Takuya Yoshioka scite author profile

Recent studies in deep learning-based speech separation have proven the superiority of time-domain approaches to conventional timefrequency-based methods. Unlike the time-frequency domain approaches, the time-domain separation systems often receive input sequences consisting of a huge number of time steps, which introduces challenges for modeling extremely long sequences. Conventional recurrent neural networks (RNNs) are not effective for modeling such long sequences due to optimization difficulties, while one-dimensional convolutional neural networks (1-D CNNs) cannot perform utterance-level sequence modeling when its receptive field is smaller than the sequence length. In this paper, we propose dual-path recurrent neural network (DPRNN), a simple yet effective method for organizing RNN layers in a deep structure to model extremely long sequences. DPRNN splits the long sequential input into smaller chunks and applies intra-and inter-chunk operations iteratively, where the input length can be made proportional to the square root of the original sequence length in each operation. Experiments show that by replacing 1-D CNN with DPRNN and apply sample-level modeling in the time-domain audio separation network (TasNet), a new state-of-the-art performance on WSJ0-2mix is achieved with a 20 times smaller model than the previous best system.

show abstract

The reverb challenge: A common evaluation framework for dereverberation and recognition of reverberant speech

Kinoshita

et al. 2013

View full text Add to dashboard Cite

Recently, substantial progress has been made in the field of reverberant speech signal processing, including both single-and multichannel de-reverberation techniques, and automatic speech recognition (ASR) techniques robust to reverberation. To evaluate state-ofthe-art algorithms and obtain new insights regarding potential future research directions, we propose a common evaluation framework including datasets, tasks, and evaluation metrics for both speech enhancement and ASR techniques. The proposed framework will be used as a common basis for the REVERB (REverberant Voice Enhancement and Recognition Benchmark) challenge. This paper describes the rationale behind the challenge, and provides a detailed description of the evaluation framework and benchmark results.

show abstract

A summary of the REVERB challenge: state-of-the-art and remaining challenges in reverberant speech processing research

Kinoshita

Delcroix

Gannot

et al. 2016

EURASIP J. Adv. Signal Process.

296

218

View full text Add to dashboard Cite

In recent years, substantial progress has been made in the field of reverberant speech signal processing, including both single-and multichannel dereverberation techniques and automatic speech recognition (ASR) techniques that are robust to reverberation. In this paper, we describe the REVERB challenge, which is an evaluation campaign that was designed to evaluate such speech enhancement (SE) and ASR techniques to reveal the state-of-the-art techniques and obtain new insights regarding potential future research directions. Even though most existing benchmark tasks and challenges for distant speech processing focus on the noise robustness issue and sometimes only on a single-channel scenario, a particular novelty of the REVERB challenge is that it is carefully designed to test robustness against reverberation, based on both real, single-channel, and multichannel recordings. This challenge attracted 27 papers, which represent 25 systems specifically designed for SE purposes and 49 systems specifically designed for ASR purposes. This paper describes the problems dealt within the challenge, provides an overview of the submitted systems, and scrutinizes them to clarify what current processing strategies appear effective in reverberant speech processing.

show abstract

The NTT CHiME-3 system: Advances in speech enhancement and recognition for mobile multi-microphone devices

et al. 2015

View full text Add to dashboard Cite

Continuous Speech Separation: Dataset and Analysis

et al. 2020

View full text Add to dashboard Cite

This paper describes a dataset and protocols for evaluating continuous speech separation algorithms. Most prior studies on speech separation use pre-segmented signals of artificially mixed speech utterances which are mostly fully overlapped, and the algorithms are evaluated based on signal-to-distortion ratio or similar performance metrics. However, in natural conversations, a speech signal is continuous, containing both overlapped and overlap-free components. In addition, the signal-based metrics have very weak correlations with automatic speech recognition (ASR) accuracy. We think that not only does this make it hard to assess the practical relevance of the tested algorithms, it also hinders researchers from developing systems that can be readily applied to real scenarios. In this paper, we define continuous speech separation (CSS) as a task of generating a set of non-overlapped speech signals from a continuous audio stream that contains multiple utterances that are partially overlapped by a varying degree. A new real recorded dataset, called LibriCSS, is derived from LibriSpeech by concatenating the corpus utterances to simulate a conversation and capturing the audio replays with far-field microphones. A Kaldi-based ASR evaluation protocol is also established by using a well-trained multi-conditional acoustic model. By using this dataset, several aspects of a recently proposed speakerindependent CSS algorithm are investigated. The dataset and evaluation scripts are available to facilitate the research in this direction.

show abstract

Generalization of Multi-Channel Linear Prediction Methods for Blind MIMO Impulse Response Shortening

Yoshioka

Nakatani

2012

IEEE Trans. Audio Speech Lang. Process.

225

153

View full text Add to dashboard Cite

Quantum Phase Transitions in the Hubbard Model on a Triangular Lattice

2009

View full text Add to dashboard Cite

We investigate the quantum phase transitions in the half-filled Hubbard model on the triangular lattice by means of the path-integral renormalization group method with a new iteration and truncation scheme proposed recently. It is found for a cluster of 36 sites that as the Hubbard interaction U increases, the paramagnetic metallic state undergoes a first-order phase transition to a nonmagnetic insulating (NMI) state at Uc1 approximately 7.4t, which is followed by another first-order transition to a 120 degrees Néel ordered state at Uc2 approximately 9.2t, where t is the transfer integral. The size dependence of the results is also addressed. Our results suggest the existence of the intermediate NMI phase and resolve some controversial arguments on the nature of the previously proposed quantum phase transitions.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Takuya Yoshioka

Speech Dereverberation Based on Variance-Normalized Delayed Linear Prediction

Dual-Path RNN: Efficient Long Sequence Modeling for Time-Domain Single-Channel Speech Separation

The reverb challenge: A common evaluation framework for dereverberation and recognition of reverberant speech

A summary of the REVERB challenge: state-of-the-art and remaining challenges in reverberant speech processing research

The NTT CHiME-3 system: Advances in speech enhancement and recognition for mobile multi-microphone devices

Continuous Speech Separation: Dataset and Analysis

Generalization of Multi-Channel Linear Prediction Methods for Blind MIMO Impulse Response Shortening

Quantum Phase Transitions in the Hubbard Model on a Triangular Lattice

Contact Info

Product

Resources

About