Jakobovski/Free-Spoken-Digit-Dataset: V1.0.8

Jackson, Zohar; Souza, César Roberto de; Flaks, Jason; Pan, Yuxin; Nicolas, Hereman; Thite, Adhish

doi:10.5281/zenodo.1342401

Cited by 15 publications

(4 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The convolutional network was implemented in Keras ( Chollet, 2018 ) with Tensorflow ( Abadi et al, 2016 ) back-end. All main results were confirmed by analyzing a standard speech data set—the so called Jakobovski free spoken digit data set (FSDD) ( Jackson et al, 2018 ), containing spoken numbers from 0 to 9 in English language in accordance to the MNIST data set with written digits in this range ( LeCun et al, 1998 ). This was done using a completely new code base exclusively build of KERAS layers.…”

Section: Methodsmentioning

confidence: 76%

“…The second used data set is an open data set consisting of spoken digits (0–9)–in analogy to the MNIST data set– in English. The data set is sampled with 8 kHz and consists of 2,000 recorded digits from four speakers ( Jackson et al, 2018 ). Here the first five repetitions of for each speaker and each digit are used as test data, the respective remaining 45 repetitions serve as training data.…”

Section: Methodsmentioning

confidence: 99%

“…As described above the complete auditory pathway beyond the DCN, including the superior olive, lateral lemniscus, inferior colliculus, medial geniculate corpus, and the auditory cortex, is modeled as a deep neural network which is trained on the classification of 207 different German words (custom- made data set), or 10 English words corresponding to the digits from 0 to 9 (FSDD data set; Jackson et al, 2018 ), respectively. In both cases the compressed, i.e., down sampled, DCN output matrices served as training and test data input.…”

Section: Methodsmentioning

confidence: 99%

See 2 more Smart Citations

Intrinsic Noise Improves Speech Recognition in a Computational Model of the Auditory Pathway

et al. 2022

View full text Add to dashboard Cite

Noise is generally considered to harm information processing performance. However, in the context of stochastic resonance, noise has been shown to improve signal detection of weak sub- threshold signals, and it has been proposed that the brain might actively exploit this phenomenon. Especially within the auditory system, recent studies suggest that intrinsic noise plays a key role in signal processing and might even correspond to increased spontaneous neuronal firing rates observed in early processing stages of the auditory brain stem and cortex after hearing loss. Here we present a computational model of the auditory pathway based on a deep neural network, trained on speech recognition. We simulate different levels of hearing loss and investigate the effect of intrinsic noise. Remarkably, speech recognition after hearing loss actually improves with additional intrinsic noise. This surprising result indicates that intrinsic noise might not only play a crucial role in human auditory processing, but might even be beneficial for contemporary machine learning approaches.

show abstract

Section: Methodsmentioning

confidence: 76%

Section: Methodsmentioning

confidence: 99%

Section: Methodsmentioning

confidence: 99%

See 1 more Smart Citation

Intrinsic Noise Improves Speech Recognition in a Computational Model of the Auditory Pathway

et al. 2022

View full text Add to dashboard Cite

show abstract

“…As such, we can use the C-LSTM architecture to conduct a controlled inquiry into our research question. and the Free Spoken Digit dataset (Jackson et al, 2018). We selected these datasets because of their tractability.…”

Section: Multimodal Convolutional Lstm Modelmentioning

confidence: 99%

On the Benefits of Early Fusion in Multimodal Representation Learning

Talukder¹,

Barnum²,

Yue³

2020

Preprint

View full text Add to dashboard Cite

Intelligently reasoning about the world often requires integrating data from multiple modalities, as any individual modality may contain unreliable or incomplete information. Prior work in multimodal learning fuses input modalities only after significant independent processing. On the other hand, the brain performs multimodal processing almost immediately. This divide between conventional multimodal learning and neuroscience suggests that a detailed study of early multimodal fusion could improve artificial multimodal representations. To facilitate the study of early multimodal fusion, we create a convolutional LSTM network architecture that simultaneously processes both audio and visual inputs, and allows us to select the layer at which audio and visual information combines. Our results demonstrate that immediate fusion of audio and visual inputs in the initial C-LSTM layer results in higher performing networks that are more robust to the addition of white noise in both audio and visual inputs.

show abstract