Multiresolution CNN for reverberant speech recognition

Park, Sunchan; Jeong, Young-Il; Kim, Hyung Soon

doi:10.1109/icsda.2017.8384470

Cited by 19 publications

(8 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It is a typical ill-posed inverse problem to infer and synthesize high resolution images from observed low resolution images. Existing algorithms can be divided into two categories according to technical means: reconstruction-based methods and learning-based methods [23]. Image reconstruction-based SR method usually requires sub-pixel alignment of LR image sequences to obtain motion offset between HR images, thus constructing spatial motion parameters in the observation model, and applying different constraints to solve HR images.…”

Section: Related Workmentioning

confidence: 99%

An Effective and Comprehensive Image Super Resolution Algorithm Combined With a Novel Convolutional Neural Network and Wavelet Transform

Yang

Wang²

2021

IEEE Access

View full text Add to dashboard Cite

In order to further improve the reconstruction effect of the image super resolution algorithm, this paper proposes an image super resolution algorithm combining deep learning and wavelet transform (ISRDW). In terms of network design, it is not only simple in structure, but also more effective in capturing image details compared with other neural network structures. At the same time, cross-connection and residual learning methods are used to reduce the difficulty of the training model. In terms of loss function, this paper uses the loss generated in the original image space domain and the wavelet domain to strengthen the constraint of network training. Experimental results show that the algorithm proposed in this paper achieves better results under different data sets and different evaluation indexes.

show abstract

Section: Related Workmentioning

confidence: 99%

An Effective and Comprehensive Image Super Resolution Algorithm Combined With a Novel Convolutional Neural Network and Wavelet Transform

Yang

Wang²

2021

IEEE Access

View full text Add to dashboard Cite

show abstract

“…Then, using the preprocessed version of the audio signal, some spectral or temporal features can be extracted via Mel-frequency Cepstral Coefficients (MFCC) [126][127][128] or Discrete Wavelet Transform (DWT) [129][130][131]. The extracted features are passed through a prediction module that employs Hidden Markov Models (HMMs) [132,133], SVMs [134][135][136], RNNs [137][138][139], or CNNs [140][141][142][143], among others, to obtain the text equivalence in the desired language restricted by a predefined vocabulary and grammar rules. More details about ASR can be found in the pertinent survey papers [124,144].…”

Section: Automatic Speech Recognitionmentioning

confidence: 99%

Towards Goal-Oriented Semantic Signal Processing: Applications and Future Challenges

Kalfa,

Gok,

Atalik

et al. 2021

Preprint

View full text Add to dashboard Cite

Advances in machine learning technology have enabled real-time extraction of semantic information in signals which can revolutionize signal processing techniques and improve their performance significantly for the next generation of applications. With the objective of a concrete representation and efficient processing of the semantic information, we propose and demonstrate a formal graph-based semantic language and a goal filtering method that enables goal-oriented signal processing. The proposed semantic signal processing framework can easily be tailored for specific applications and goals in a diverse range of signal processing applications. To illustrate its wide range of applicability, we investigate several use cases and provide details on how the proposed goal-oriented semantic signal processing framework can be customized. We also investigate and propose techniques for communications where sensor data is semantically processed and semantic information is exchanged across a sensor network.

show abstract

“…In a picture, multiple resolutions can be helpful to recognize objects at different scales [21], [22], but the desired benefit when using more than one resolution in audio applications is to exploit different details of the feature maps with each resolution point. For instance, the use of two different resolutions has been proposed to improve automatic speech recognition in reverberant scenarios [23], in which a wide-context window gives information about the acoustic environment and reverberation, whereas a narrow-context window provides finer detail about the content of the speech signal. This is possible due to the existence of a tradeoff between time resolution and frequency resolution in the extraction of Fast Fourier Transform-based audio features [24] such as the mel-spectrogram, which is also the base for the analysis proposed in this work.…”

Section: Introductionmentioning

confidence: 99%

A Multi-Resolution CRNN-Based Approach for Semi-Supervised Sound Event Detection in DCASE 2020 Challenge

2021

View full text Add to dashboard Cite

Sound Event Detection is a task with a rising relevance over the recent years in the field of audio signal processing, due to the creation of specific datasets such as Google AudioSet or DESED (Domestic Environment Sound Event Detection) and the introduction of competitive evaluations like the DCASE Challenge (Detection and Classification of Acoustic Scenes and Events). The different categories of acoustic events can present diverse temporal and spectral characteristics. However, most approaches use a fixed time-frequency resolution to represent the audio segments. This work proposes a multi-resolution analysis for feature extraction in Sound Event Detection, hypothesizing that different resolutions can be more adequate for the detection of different sound event categories, and that combining the information provided by multiple resolutions could improve the performance of Sound Event Detection systems. Experiments are carried out over the DESED dataset in the context of the DCASE 2020 Challenge, concluding that the combination of up to 5 resolutions allows a neural network-based system to obtain better results than single-resolution models in terms of event-based F1-score in every event category and in terms of PSDS (Polyphonic Sound Detection Score). Furthermore, we analyze the impact of score thresholding in the computation of F1-score results, finding that the standard value of 0.5 is suboptimal and proposing an alternative strategy based in the use of a specific threshold for each event category, which obtains further improvements in performance.

show abstract

Multiresolution CNN for reverberant speech recognition

Cited by 19 publications

References 7 publications

An Effective and Comprehensive Image Super Resolution Algorithm Combined With a Novel Convolutional Neural Network and Wavelet Transform

An Effective and Comprehensive Image Super Resolution Algorithm Combined With a Novel Convolutional Neural Network and Wavelet Transform

Towards Goal-Oriented Semantic Signal Processing: Applications and Future Challenges

A Multi-Resolution CRNN-Based Approach for Semi-Supervised Sound Event Detection in DCASE 2020 Challenge

Contact Info

Product

Resources

About