Aim: The main goal of the present study was to assess the feasibility of using evoked stapedius reflex (eSR) and evoked compound action potential (eCAP) thresholds to create speech processor programs for children using Med-El Maestro software. The secondary goals were (1) to compare the eSR and eCAP thresholds recorded using charge units in experienced adults fitted with Med-El Pulsar CI100 cochlear implants with most comfortable loudness levels (MCLs) obtained for the apical, medial and basal electrodes, and (2) to compare eSR and eCAP thresholds for the apical, medial and basal electrodes between adults and children. Methods: Fourteen children and 16 adults participated in the study. eSR and eCAP thresholds were measured in both groups using the auditory nerve response telemetry algorithm, with MCL being behaviourally measured only in the adult group. Results: In the adult population, the correlation between eSR threshold and MCL was better for apical, medial and basal electrodes than that between eCAP threshold and MCL. There was no significant difference in the means obtained for eCAP and eSR thresholds in children and adults for any of the electrodes tested. This finding suggests that in children, the correlations between eCAP thresholds and MCL values, and those between eSR thresholds and MCL values are not lower than those generally found in adults. Conclusions: Although the eSR threshold is a better predictor of MCL values, both eSR and eCAP thresholds can be useful tools for assisting with map creation for children.
A review of available audio-visual speech corpora and a description of a new multimodal corpus of English speech recordings is provided. The new corpus containing 31 hours of recordings was created specifically to assist audio-visual speech recognition systems (AVSR) development. The database related to the corpus includes high-resolution, high-framerate stereoscopic video streams from RGB cameras, depth imaging stream utilizing Time-of-Flight camera accompanied by audio recorded using both: a microphone array and a microphone built in a mobile computer. For the purpose of applications related to AVSR systems training, every utterance was manually labeled, resulting in label files added to the corpus repository. Owing to the inclusion of recordings made in noisy conditions the elaborated corpus can also be used for testing robustness of speech recognition systems in the presence of acoustic background noise. The process of building the corpus, including the recording, labeling and post-processing phases is described in the paper. Results achieved with the developed audio-visual automatic speech recognition (ASR) engine trained and tested with the material contained in the corpus are presented and discussed together with comparative test results employing a state-of-the-art/commercial ASR engine. In order to demonstrate the practical use of the corpus it is made available for the public use.
Due to an increasing amount of music being made available in digital form in the Internet, an automatic organization of music is sought. The paper presents an approach to graphical representation of mood of songs based on Self-Organizing Maps. Parameters describing mood of music are proposed and calculated and then analyzed employing correlation with mood dimensions based on the Multidimensional Scaling. A map is created in which music excerpts with similar mood are organized next to each other on the two-dimensional display.
The aim of this article is to investigate whether separating music tracks at the preprocessing phase and extending feature vector by parameters related to the specific musical instruments that are characteristic for the given musical genre allow for efficient automatic musical genre classification in case of database containing thousands of music excerpts and a dozen of genres. Results of extensive experiments show that the approach proposed for music genre classification is promising. Overall, conglomerating parameters derived from both an original audio and a mixture of separated tracks improve classification effectiveness measures, demonstrating that the proposed feature vector and the Support Vector Machine (SVM) with Co-training mechanism are applicable to a large dataset.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.