A Survey on Deep Reinforcement Learning for Audio-Based Applications

Latif, Siddique; Cuayáhuitl, Heriberto; Pervez, Farrukh; Shamshad, Fahad; Ali, Hafiz Shehbaz; Wang, Zhaoxia

doi:10.48550/arxiv.2101.00240

Cited by 4 publications

(9 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Indeed, DRL has proved its power in developing autonomous agents in broad range of fields including audio-based applications [99], multiple agent systems [100], mobile and wireless networking [101], connected autonomous vehicles in smart cities [102], and optimal control [103].…”

Section: Deep Reinforcement Learningmentioning

confidence: 99%

Exploiting Multi-Modal Fusion for Urban Autonomous Driving Using Latent Deep Reinforcement Learning

Khalil

Mouftah

2023

IEEE Trans. Veh. Technol.

View full text Add to dashboard Cite

show abstract

Section: Deep Reinforcement Learningmentioning

confidence: 99%

Exploiting Multi-Modal Fusion for Urban Autonomous Driving Using Latent Deep Reinforcement Learning

Khalil

Mouftah

2023

IEEE Trans. Veh. Technol.

View full text Add to dashboard Cite

show abstract

“…Nevertheless, when it comes to processing raw audio waveforms with high sample rates, the limited receptive fields of CNNs can present challenges [ 6 ]. Dilated convolution layers have emerged as a solution to address this issue.…”

Section: Deep Learning Models In Audio-based Applicationsmentioning

confidence: 99%

“…Extensive research has been undertaken to explore visual, radar, radio-frequency, and audio-based methodologies, each Audio processing technology plays a ubiquitous role in our daily lives, as exemplified by the prevalence of popular products like Apple's Siri, Amazon's Alexa, and Google Home Mini Dot, which leverage audio processing and artificial intelligence (AI). AI serves as the underlying mechanism enabling computers and smartphones to comprehend human speech, thus facilitating e ective interaction between humans and machines [ 6 ]. At the core of audio-based intelligent systems lies the ability to listen to and interact with the environment, continuously learning and enhancing their responses.…”

Section: Introduction 11 Backgroundmentioning

confidence: 99%

A Large-Scale UAV Audio Dataset and Audio-Based UAV Classification Using CNN

Wang

Chu

et al. 2022

2022 Sixth IEEE International Conference on Robotic Computing (IRC)

View full text Add to dashboard Cite

“…Speech is the primary means of communication among human beings; as such, speech recognition systems have received considerable interest among researchers in recent decades. However, due to reliability issues, the systems developed have not been widely implemented (Latif et al, 2021;Otter et al, 2020;Strehl et al, 2006). Nevertheless, the major advancements in machine learning and deep learning in recent years have led to accurate speech recognition with high reliability that has increased the practicability of speech recognition systems (Hinton et al, 2012;Meftah et al, 2018).…”

Section: Related Studiesmentioning

confidence: 99%

Keyword identification framework for speech communication on construction sites

Mansoor

Liu

Ali

et al. 2022

mocs

View full text Add to dashboard Cite

Worksite communication is a key to boosting teamwork and improving worker performance on the construction worksite. Communication among workers on the construction site mostly consists of speech communication. However, construction sites are typically noisy due to construction tasks like drilling and operation of heavy equipment. Meanwhile, workers on construction sites typically represent a range of different ethnic and linguistic backgrounds and have different speaking accents. This can make it difficult for the listener to understand the speaker clearly, leading to miscommunication and errors in decision making on the construction site. Technological advancements in recent years can be leveraged to mitigate this problem. In this paper, a keyword identification framework is developed for speech communication on the construction site. For this framework, 12 hours of raw audio data containing 18 crane signalman speech commands (referred to as “keywords”) are collected. The crane signalman uses specific keywords to communicate with the crane operator and guide the crane operator in the crane operations. The 2-second audio clips (this being the approximate duration of each keyword) are extracted from the raw audio dataset, and construction site noise is added. Moreover, mel-frequency cepstral coefficients are extracted from the waveform audio dataset. The extracted mel-frequency cepstral coefficients, in turn, are used to train the 1-dimensional convolutional neural network. After training, the model is found to achieve a training accuracy of 97.3%, a validation accuracy of 96.1%, and a testing accuracy of 93.8%. The model is further deployed for real-time identification of keywords in speech, with the model achieving an accuracy of 95.3%. In light of these findings, it can be concluded that the developed framework is suitable for real-time application in noisy construction sites for identifying specific keywords in speech.

show abstract

A Survey on Deep Reinforcement Learning for Audio-Based Applications

Cited by 4 publications

References 0 publications

Exploiting Multi-Modal Fusion for Urban Autonomous Driving Using Latent Deep Reinforcement Learning

Exploiting Multi-Modal Fusion for Urban Autonomous Driving Using Latent Deep Reinforcement Learning

A Large-Scale UAV Audio Dataset and Audio-Based UAV Classification Using CNN

Keyword identification framework for speech communication on construction sites

Contact Info

Product

Resources

About