Environmental Sound Classiﬁcation on the Edge: A Pipeline for Deep Acoustic Networks on Extremely Resource-Constrained Devices

Mohaimenuzzaman,; Bergmeir, Christoph; West, Ian Thomas; Meyer, Bernd

doi:10.1016/j.patcog.2022.109025

Cited by 32 publications

(37 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Datasets on insects were few, with the majority having sounds for birds, frogs, cats, whales, and dogs. For general acoustics, the most popular dataset was the US8K (Urban Sound 8K) which contains 8732 labeled sound excerpts of urban sounds [ 119 , 127 , 131 , 132 , 138 , 141 , 146 , 149 ] as shown in Figure 7 . The ESC 50 and ESC 10 datasets were also among the popular datasets [ 119 , 130 , 131 , 132 , 141 , 143 , 144 , 146 , 147 , 148 , 149 ].…”

Section: Resultsmentioning

confidence: 99%

“…For general acoustics, the most popular dataset was the US8K (Urban Sound 8K) which contains 8732 labeled sound excerpts of urban sounds [ 119 , 127 , 131 , 132 , 138 , 141 , 146 , 149 ] as shown in Figure 7 . The ESC 50 and ESC 10 datasets were also among the popular datasets [ 119 , 130 , 131 , 132 , 141 , 143 , 144 , 146 , 147 , 148 , 149 ]. They contain a mixture of bioacoustics and general acoustics sounds.…”

Section: Resultsmentioning

confidence: 99%

“…After collecting audio data, they need to undergo preprocessing techniques that clean and transform them for classification. Most bioacoustics [ 48 , 49 , 50 , 51 , 53 , 54 , 61 , 62 , 63 , 64 , 68 , 69 , 70 , 71 , 72 , 73 , 74 , 75 , 81 , 82 , 86 ], and general acoustic [ 113 , 114 , 115 , 117 , 118 , 120 , 121 , 123 , 124 , 126 , 132 , 135 , 137 , 138 , 139 , 140 , 141 , 142 , 143 , 144 , 145 , 146 , 147 , 149 , 153 , 154 ] studies did not describe the preprocessing techniques that they used. An analysis of the studies that mentioned preprocessing revealed the most popular audio transformation technique as STFT (short-time Fourier transform) among both the bioacoustics [ 52 , 60 <...…”

Section: Resultsmentioning

confidence: 99%

See 2 more Smart Citations

A Review of Automated Bioacoustics and General Acoustics Classification Research

Mutanu

Gohil

Gupta

et al. 2022

Sensors

View full text Add to dashboard Cite

Automated bioacoustics classification has received increasing attention from the research community in recent years due its cross-disciplinary nature and its diverse application. Applications in bioacoustics classification range from smart acoustic sensor networks that investigate the effects of acoustic vocalizations on species to context-aware edge devices that anticipate changes in their environment adapt their sensing and processing accordingly. The research described here is an in-depth survey of the current state of bioacoustics classification and monitoring. The survey examines bioacoustics classification alongside general acoustics to provide a representative picture of the research landscape. The survey reviewed 124 studies spanning eight years of research. The survey identifies the key application areas in bioacoustics research and the techniques used in audio transformation and feature extraction. The survey also examines the classification algorithms used in bioacoustics systems. Lastly, the survey examines current challenges, possible opportunities, and future directions in bioacoustics.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Resultsmentioning

confidence: 99%

Section: Resultsmentioning

confidence: 99%

See 1 more Smart Citation

A Review of Automated Bioacoustics and General Acoustics Classification Research

Mutanu

Gohil

Gupta

et al. 2022

Sensors

View full text Add to dashboard Cite

show abstract

“…We selected a CNN model, which is a deep neural network-based machine learning model, for three primary reasons. First, CNN-based models still achieve state-of-the-art in many audio classification tasks due to their ability to effectively extract local features from the audio input [45] [46]. As a result, combining them with transformers has been popular in several recent works to achieve SOTA accuracies in different speech recognition tasks [47] [48].…”

Section: Machine Learning Classifiermentioning

confidence: 99%

LimitAccess: on-device TinyML based robust speech recognition and age classification

Maayah

Abunada

Al-Janahi

et al. 2023

Discov Artif Intell

View full text Add to dashboard Cite

Automakers from Honda to Lamborghini are incorporating voice interaction technology into their vehicles to improve the user experience and offer value-added services. Speech recognition systems are a key component of smart cars, enhancing convenience and safety for drivers and passengers. In the future, safety-critical features may rely on speech recognition, but this raises concerns about children accessing such services. To address this issue, the LimitAccess system is proposed, which uses TinyML for age classification and helps parents limit children’s access to critical speech recognition services. This study employs a lite convolutional neural network (CNN) model for two different reasons: First, CNN showed superior accuracy compared to other audio classification models for age classification problems. Second, the lite model will be integrated into a microcontroller to meet its limited resource requirements. To train and evaluate our model, we created a dataset that included child and adult voices of the keyword “open”. The system approach categorizes voices into age groups (child, adult) and then utilizes that categorization to grant access to a car. The robustness of the model was enhanced by adding a new class (recordings) to the dataset, which enabled our system to detect replay and synthetic voice attacks. If an adult voice is detected, access to start the car will be granted. However, if a child’s voice or a recording is detected, the system will display a warning message that educates the child about the dangers and consequences of the improper use of a car. Arduino Nano 33 BLE sensing was our embedded device of choice for integrating our trained, optimized model. Our system achieved an overall F1 score of 87.7% and 85.89% accuracy. LimitAccess detected replay and synthetic voice attacks with an 88% F1 score.

show abstract

“…Although several studies have been carried out in the forest acoustic monitoring context, still, a standard benchmark dataset specific to forest sounds is unavailable. Therefore, most of the existing studies have utilized publicly available environmental sound datasets such as ESC-50 [ 4 , 13 , 14 , 15 , 16 , 17 ], UrbanSound8K (U8k) [ 14 , 18 , 19 , 20 , 21 ], FSD50K [ 22 , 23 ], and SONYC-UST [ 24 , 25 ]. These datasets contain a large quantity of audio data categorized into several groups covering a broad area of sound events.…”

Section: Introductionmentioning

confidence: 99%

Forest Sound Classification Dataset: FSC22

Bandara

Jayasundara

Ariyarathne

et al. 2023

Sensors

View full text Add to dashboard Cite

The study of environmental sound classification (ESC) has become popular over the years due to the intricate nature of environmental sounds and the evolution of deep learning (DL) techniques. Forest ESC is one use case of ESC, which has been widely experimented with recently to identify illegal activities inside a forest. However, at present, there is a limitation of public datasets specific to all the possible sounds in a forest environment. Most of the existing experiments have been done using generic environment sound datasets such as ESC-50, U8K, and FSD50K. Importantly, in DL-based sound classification, the lack of quality data can cause misguided information, and the predictions obtained remain questionable. Hence, there is a requirement for a well-defined benchmark forest environment sound dataset. This paper proposes FSC22, which fills the gap of a benchmark dataset for forest environmental sound classification. It includes 2025 sound clips under 27 acoustic classes, which contain possible sounds in a forest environment. We discuss the procedure of dataset preparation and validate it through different baseline sound classification models. Additionally, it provides an analysis of the new dataset compared to other available datasets. Therefore, this dataset can be used by researchers and developers who are working on forest observatory tasks.

show abstract

Environmental Sound Classiﬁcation on the Edge: A Pipeline for Deep Acoustic Networks on Extremely Resource-Constrained Devices

Cited by 32 publications

References 26 publications

A Review of Automated Bioacoustics and General Acoustics Classification Research

A Review of Automated Bioacoustics and General Acoustics Classification Research

LimitAccess: on-device TinyML based robust speech recognition and age classification

Forest Sound Classification Dataset: FSC22

Contact Info

Product

Resources

About