2015
DOI: 10.1186/s13634-015-0238-6
|View full text |Cite
|
Sign up to set email alerts
|

Speech recognition in reverberant and noisy environments employing multiple feature extractors and i-vector speaker adaptation

Abstract: The REVERB challenge provides a common framework for the evaluation of feature extraction techniques in the presence of both reverberation and additive background noise. State-of-the-art speech recognition systems perform well in controlled environments, but their performance degrades in realistic acoustical conditions, especially in real as well as simulated reverberant environments. In this contribution, we utilize multiple feature extractors including the conventional mel-filterbank, multi-taper spectrum es… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2017
2017
2021
2021

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 11 publications
(6 citation statements)
references
References 34 publications
0
6
0
Order By: Relevance
“…Therefore, this study used LSTM instead of RNN to encode the English text. LSTM [13] is also a kind of recurrent neural network algorithm. Compared with the traditional RNN, LSTM introduces the structural units of input gate, forgetting gate, and output gate to simulate the phenomena of deep impression and forgetting in the process of human brain memory, thus reducing the unimportant parts in the English text, highlighting the key points, reducing the amount of computation while enhancing the accuracy.…”
Section: Improving Machine Translation With Long Short Term Memorymentioning
confidence: 99%
“…Therefore, this study used LSTM instead of RNN to encode the English text. LSTM [13] is also a kind of recurrent neural network algorithm. Compared with the traditional RNN, LSTM introduces the structural units of input gate, forgetting gate, and output gate to simulate the phenomena of deep impression and forgetting in the process of human brain memory, thus reducing the unimportant parts in the English text, highlighting the key points, reducing the amount of computation while enhancing the accuracy.…”
Section: Improving Machine Translation With Long Short Term Memorymentioning
confidence: 99%
“…The combined, 60-dimensional features are referred to as LDA+STC [15] and used as an input to the i-vector extractor. The GMM-UBM using full-covariance GMMs with 512 components is trained using Baum-Welch statistics extraction [18]. All the parameters of the trained GMM-UBM are converted into a single supervector, and reduced to 100 dimensional i-vectors using the i-vector extractor (the total variability matrix T).…”
Section: Related Workmentioning
confidence: 99%
“…A variety of methods for the analysis of sensor data [1]- [4] and the extraction of meaningful patterns from these data have been proposed in recent decades [5]. Data collected by various sensors such as image, voice, electromyography (EMG) and chemical sensors are used for different applications such as image recognition [6]- [8], speech recognition [9], [10], gesture recognition [11]- [14] and gas classification [15]- [20]. The performance of classification techniques using sensor data varies greatly depending not only on the amount of data collected but also on the quality of the data.…”
Section: Introductionmentioning
confidence: 99%