In this paper we develop different mathematical models in the framework of the multi-stream paradigm for noise robust ASR, and discuss their close relationship with human speech perception. Largely inspired by Fletcher's "product-of-errors" rule in psychoacoustics, multiband ASR aims for robustness to data mismatch through the exploitation of spectral redundancy, while making minimum assumptions about noise type. Previous ASR tests have shown that independent sub-band processing can lead to decreased recognition performance with clean speech. We have overcome this problem by considering every combination of data sub-bands as an independent data stream. After introducing the background to multi-band ASR, we show how this "full combination" approach can be formalised, in the context of HMM/ANN based ASR, by introducing a latent variable to specify which data sub-bands in each data frame are free from data mismatch. This enables us to decompose the posterior probability for each phoneme into a reliability weighted integral over all possible positions of clean data. This approach offers great potential for adaptation to rapidly changing and unpredictable noise.
In this work we demonstrate an improvement in the state-of-theart large vocabulary continuous speech recognition (LVCSR) performance, under clean and noisy conditions, by the use of visual information, in addition to the traditional audio one. We take a decision fusion approach for the audiovisual information, where the single-modality (audio-and visual-only) HMM classifiers are combined to recognize audiovisual speech. More specifically, we tackle the problem of estimating the appropriate combination weights for each of the modalities. Two different techniques are described: The first uses an automatically extracted estimate of the audio stream reliability in order to modify the weights for each modality (both clean and noisy audio results are reported), while the second is a discriminative model combination approach where weights on pre-defined model classes are optimized to minimize WER (clean audio only results).
Being used in for environmental and military Internet of Things (IoT), a low power wake-up system based on frequency analysis is presented in this paper. It aims at detecting continuously the presence of specific very high frequencies in the input acoustic signal of an embedded system. This can be used for detecting specific animal species, and for triggering a recording system or generating alerts. Used for harmful species detection, this helps to save harvests or to protect strict nature reserves. It can also be used for detecting the presence of drones in a specific restricted area. This acoustic low power wake-up system uses a simple 16 bits micro-controller (MCU), with a strong emphasis on the low power management of the system, having a target of continuous detection for at least one year on a single standard 1.2Ah-12V lead battery. For that, it makes the most of mixed analog and digital low power MCU modules. They are including comparators, timers and a special one present on Microchip MCU, called Charge Time Measurement Unit (CTMU). This is a driven constant current source for making time to frequency conversions at a very low power and algorithmic cost. Optimizing low power modes, this low power wake-up system based on frequency analysis has a power consumption of 0.56mW , leading to approximately 3 years of battery life on a single standard 1.2Ah-12V lead cell.
We present an analysis of fin whale (Balaenoptera physalus) songs on passive acoustic recordings from the Pelagos Sanctuary (Western Mediterranean Basin). The recordings were gathered between 2008 and 2018 using 2 different hydrophone stations. We show how 20 Hz fin whale pulses can be automatically detected using a low complexity convolutional neural network (CNN) despite data variability (different recording devices exposed to diverse noises). The pulses were further classified into the two categories described in past studies and inter pulse intervals (IPI) were measured. The results confirm previous observations on the local relationship between pulse type and IPI with substantially more data. Furthermore we show inter-annual shifts in IPI and an intra-annual trend in pulse center frequency. This study provides new elements of comparison for the understanding of long term fin whale song trends worldwide.
International audienceThis article compares eight different diversity methods: 3 based on visual information, 1 based on date information, 3 adapted to each topic based on location and visual information; finally, for completeness, 1 based on random permutation. To compare the effectiveness of these methods, we apply them on 26 runs obtained with varied methods from different research teams and based on different modalities. We then discuss the results of the more than 200 obtained runs. The results show that query-adapted methods are more efficient than non-adapted method, that visual only runs are more difficult to diversify than text only and text-image runs, and finally that only few methods maximize both the precision and the cluster recall at 20 documents
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.