Automatic Multiscale-based Peak Detection on Short Time Energy and Spectral Centroid Feature Extraction for Conversational Speech Segmentation

Prasetio, Barlian Henryranu; Widasari, Edita Rosana; Tamura, H.

doi:10.1145/3479645.3479675

Cited by 4 publications

(4 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The short-time average zero crossing rate refers to the number of times the signal crosses the zero value in each frame, which can reflect the frequency spectral characteristics to a certain extent, and is a kind of sound signal time-domain feature often used in speech endpoint detection [14]. As the bowel sounds signals vary in strength, it is difficult to see obvious changes in the STE only for the sudden and weaker bowel sounds, while their short-time average crossing zero rate is usually higher, which can be used as one of the features to analyze the bowel sounds.…”

Section: ) Zero Crossing Rate (Zcr)mentioning

confidence: 99%

“…In cognizance of this, our research adopts a comprehensive approach by considering both frequency domain features and time domain features inherent in neonatal bowel sounds. We have strategically extracted MFCC [12], Short Time Energy (STE) [13], and Zero Crossing Rate (ZCR) [14] as integral components of our feature extraction methodology. These features collectively encapsulate the nuanced characteristics of neonatal bowel sounds.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Diagnosis of NEC using a Multi-Feature Fusion Machine Learning Algorithm

Li,

Han,

et al. 2024

ijacsa

View full text Add to dashboard Cite

Necrotizing enterocolitis (NEC) is a severe gastrointestinal emergency in neonates, marked by its complex etiology, ambiguous clinical manifestations, and significant morbidity and mortality, profoundly affecting long-term pediatric health outcomes. The prevailing diagnostic approaches for NEC, including traditional manual auscultation of bowel sounds, suffer from limited sensitivity and specificity, leading to potential misdiagnoses and delayed treatment. In this paper, we introduce a groundbreaking NEC diagnostic framework employing machine learning algorithms that utilize multi-feature fusion of bowel sounds, significantly improving the diagnostic accuracy. Bowel sounds from NEC patients and healthy newborns are meticulously captured using a specialized acquisition system, designed to overcome the inherent challenges associated with the low amplitude, substantial background noise, and high variability of neonatal bowel sounds. To enhance the diagnostic framework, we extract mel-frequency cepstral coefficient (MFCC), short-time energy (STE), and zero-crossing rate (ZCR) to capture comprehensive frequency and time domain features, ensuring a robust representation of bowel sound characteristics. These features are then integrated using a multi-feature fusion technique to form a singular feature vector, providing a rich, integrated dataset for the machine learning algorithm. Employing the support vector machine (SVM), the algorithm achieved an accuracy (ACC) of 88.00%, sensitivity (SEN) of 100.00%, and an area under the receiver operating characteristic (ROC) curve (AUC) of 97.62%, achieving high accuracy in diagnosing NEC. This innovative approach not only improves the accuracy and objectivity of NEC diagnosis but also shows promise in revolutionizing neonatal care through facilitating early and precise diagnosis. It significantly enhances clinical outcomes for affected neonates.

show abstract

Section: ) Zero Crossing Rate (Zcr)mentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Diagnosis of NEC using a Multi-Feature Fusion Machine Learning Algorithm

Li,

Han,

et al. 2024

ijacsa

View full text Add to dashboard Cite

show abstract

“…For the low-level acoustic features, we calculated two typical acoustic features: mel frequency cepstral coefficients (MFCCs) (Grama and Rusu, 2017) and Spectral Centroid (Prasetio et al, 2021).…”

Section: Exploration Of Relationship Between High-level Semantic Feat...mentioning

confidence: 99%

“…where f is the actual measured frequency, Mel(f ) is the Melscale, 2,595 and 700 are the commonly used constants in Mel-scale formula. Spectral Centroid is one of the important physical parameters describing the properties of timber, which indicates where the centroid of the spectrum is located (Prasetio et al, 2021). Generally, the audios with dark and deep quality tend to have more low-frequency components and relatively low Spectral Centroid, while the audios with bright and cheerful quality mostly concentrate on high frequency and relatively high Spectral Centroid.…”

Section: Exploration Of Relationship Between High-level Semantic Feat...mentioning

confidence: 99%

A hybrid learning framework for fine-grained interpretation of brain spatiotemporal patterns during naturalistic functional magnetic resonance imaging

Shi²,

Wang³

et al. 2022

Front. Hum. Neurosci.

View full text Add to dashboard Cite

Naturalistic stimuli, including movie, music, and speech, have been increasingly applied in the research of neuroimaging. Relative to a resting-state or single-task state, naturalistic stimuli can evoke more intense brain activities and have been proved to possess higher test–retest reliability, suggesting greater potential to study adaptive human brain function. In the current research, naturalistic functional magnetic resonance imaging (N-fMRI) has been a powerful tool to record brain states under naturalistic stimuli, and many efforts have been devoted to study the high-level semantic features from spatial or temporal representations via N-fMRI. However, integrating both spatial and temporal characteristics of brain activities for better interpreting the patterns under naturalistic stimuli is still underexplored. In this work, a novel hybrid learning framework that comprehensively investigates both the spatial (via Predictive Model) and the temporal [via convolutional neural network (CNN) model] characteristics of the brain is proposed. Specifically, to focus on certain relevant regions from the whole brain, regions of significance (ROS), which contain common spatial activation characteristics across individuals, are selected via the Predictive Model. Further, voxels of significance (VOS), whose signals contain significant temporal characteristics under naturalistic stimuli, are interpreted via one-dimensional CNN (1D-CNN) model. In this article, our proposed framework is applied onto the N-fMRI data during naturalistic classical/pop/speech audios stimuli. The promising performance is achieved via the Predictive Model to differentiate the different audio categories. Especially for distinguishing the classic and speech audios, the accuracy of classification is up to 92%. Moreover, spatial ROS and VOS are effectively obtained. Besides, temporal characteristics of the high-level semantic features are investigated on the frequency domain via convolution kernels of 1D-CNN model, and we effectively bridge the “semantic gap” between high-level semantic features of N-fMRI and low-level acoustic features of naturalistic audios in the frequency domain. Our results provide novel insights on characterizing spatiotemporal patterns of brain activities via N-fMRI and effectively explore the high-level semantic features under naturalistic stimuli, which will further benefit the understanding of the brain working mechanism and the advance of naturalistic stimuli clinical application.

show abstract