A CNN-based approach to identification of degradations in speech signals

Saishu, Yuki; Poorjam, Amir Hossein; Christensen, Mads Græsbøll

doi:10.1186/s13636-021-00198-4

Cited by 6 publications

(4 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Gabor transform or spectrogram is a vital tool in signal analysis [35]- [37]. It excels at identifying signal components, making it valuable for sound differentiation, like in Shazam's music classification [38].…”

Section: B Related Work In Frequency Domain Analysismentioning

confidence: 99%

CF-AIDS: Comprehensive Frequency-Agnostic Intrusion Detection System on In-Vehicle Network

Islam,

Sahlabadi,

Kim

et al. 2024

IEEE Access

View full text Add to dashboard Cite

Many studies have focused on obtaining high accuracy in the design of Intrusion Detection Systems (IDS) for in-vehicle networks, neglecting the significance of different intensive packet injection techniques. Because of their reliance on scenario-specific training datasets, these IDSs are vulnerable to failing to detect real-world attacks. This study implemented deep learning (DL)-based classification for intrusion detection using a Gated Recurrent Unit (GRU) while considering various intrusion frequencies. Different intrusion frequencies are comprehensively addressed with frequency-agnostic intrusion and resolved by generalizing features for DL input through time series segmentation and frequency domain conversion using Gabor filtering.For training purposes, five types of vehicle data are used, encompassing DoS, fuzzing, and replay attack scenarios. The accuracy range for mechanical version vehicles is typically between 95% and 100%. For electronic vehicles, it is around 90%. Considering the nature of this IDS system, it has been named a Comprehensive Frequency-Agnostic Intrusion Detection System (CF-AIDS). Although this IDS can perform better in all aspects, achieving more efficient results requires a larger amount of situational data.

show abstract

Section: B Related Work In Frequency Domain Analysismentioning

confidence: 99%

CF-AIDS: Comprehensive Frequency-Agnostic Intrusion Detection System on In-Vehicle Network

Islam,

Sahlabadi,

Kim

et al. 2024

IEEE Access

View full text Add to dashboard Cite

show abstract

“…Other NR tools produce estimates of objective values including FR speech quality values [23], [30], [32], [38], [44], [51], [54], [56], [57], FR speech intelligibility values [30], [32], [38], [44], [52], [54], [56], [57], speech transmission index [22], codec bit-rate [46], and detection of specific impairments, artifacts, or noise types [34], [39], [41], [52]. Some of these tools perform a single task and others perform multiple tasks.…”

Section: A Existing Machine Learning Approachesmentioning

confidence: 99%

Wideband Audio Waveform Evaluation Networks: Efficient, Accurate Estimation of Speech Qualities

Catellier,

Voran

2023

IEEE Access

View full text Add to dashboard Cite

Speech quality and speech intelligibility can vary dramatically across the wide range of currently available telecommunications systems, devices, and operating environments. This creates a strong demand for efficient real-time measurements of quality and intelligibly. Wideband Audio Waveform Evaluation Networks (WAWEnets) are convolutional neural networks that operate directly on wideband audio waveforms in order to produce evaluations of those waveforms. In the present work these evaluations give qualities of telecommunications speech (e.g., noisiness, intelligibility, overall speech quality). WAWEnets are no-reference networks because they do not require ''reference'' (original or undistorted) versions of the waveforms they evaluate. Our initial WAWEnet publication introduced four WAWEnets and each emulated the output of an established full-reference speech quality or intelligibility estimation algorithm. We have updated the WAWEnet architecture to be more efficient and effective. Here we present a single WAWEnet that closely tracks seven different quality and intelligibility values with per-segment correlations in the range of 0.92 to 0.96. We create a second network that additionally tracks four subjective speech quality dimensions. We offer a third network that focuses on just subjective quality scores and achieves a per-segment correlation of 0.97. The performance of our WAWEnet architecture compares favorably to models with orders-of-magnitude more parameters and computational complexity. This work has leveraged 334 hours of speech in 13 languages, over two million full-reference target values and over 93,000 subjective mean opinion scores. We also interpret the operation of WAWEnets and identify the key to their operation using the language of signal processing: ReLUs strategically move spectral information from non-DC components into the DC component. The DC values of 96 output signals define a vector in a 96-D latent space and this vector is then mapped to a quality or intelligibility value for the input waveform.

show abstract

“…According to experimental results employing two different speech kinds, namely diseased voice and normal running speech. By highlighting the areas of the log-mel spectrogram that have a greater impact on the target degradation, they can visually see how the network decides to distinguish between different forms of deterioration in speech signals using the score weighted class activation mapping [27].…”

Section: Previous Studiesmentioning

confidence: 99%

Review "Smoker/Non-Smoker Classification of People Using a Speech Signal"

Khudhur Zaal,

Faisal Mohammad

2023

IRJIET

View full text Add to dashboard Cite

Speech is a behavioral biometric that can reveal a person's age, gender, race, and emotional state. The speech signal may also be used to ascertain a person's behavior, such as whether or not they smoke or take drugs. One of the topics that is frequently studied in the field of speech technology is the smoking habits of speakers. Over the past years, a lot of research has been done in this area, but little progress has been made in this field. As deep learning techniques have advanced in most machine learning fields, they have replaced earlier research techniques for speech recognition and verification. The most cutting-edge method for confirming and recognizing a speaker's identity is currently deep learning. This study's objective is to analyze research that uses speech signals and artificial intelligence to distinguish smokers from non-smokers. Every speech recognition system uses a variety of algorithms to convert sound waves into information that can be interpreted and processed by the system, which then generates an output that can be used as needed.

show abstract

A CNN-based approach to identification of degradations in speech signals

Cited by 6 publications

References 26 publications

CF-AIDS: Comprehensive Frequency-Agnostic Intrusion Detection System on In-Vehicle Network

CF-AIDS: Comprehensive Frequency-Agnostic Intrusion Detection System on In-Vehicle Network

Wideband Audio Waveform Evaluation Networks: Efficient, Accurate Estimation of Speech Qualities

Review "Smoker/Non-Smoker Classification of People Using a Speech Signal"

Contact Info

Product

Resources

About