Exploration of vowel onset and offset points for hybrid speech segmentation

Sarma, Biswajit Dev; Sharma, Bidisha; Shanmugam, S. Aswin; Prasanna, S. R. Mahadeva; Murthy, Hema A.

doi:10.1109/tencon.2015.7373137

Cited by 5 publications

(3 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Using this manually marked starting label, we synchronize the source (loudspeaker) signal and the 4-channel recorded audio signals. Considering the start of the audio as an anchor point, we segment all the sample sounds with energy based evidence [27,28,29] and manual observation. In this way, we achieve 988 segmented audio files and a TSP signal for each DOA angle.…”

Section: Post-processingmentioning

confidence: 99%

SLoClas: A Database for Joint Sound Localization and Classification

Qian¹,

Sharma²,

Abridi³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

In this work, we present the development of a new database, namely Sound Localization and Classification (SLoClas) corpus, for studying and analyzing sound localization and classification. The corpus contains a total of 23.27 hours of data recorded using a 4-channel microphone array. 10 classes of sounds are played over a loudspeaker at 1.5 meters distance from the array by varying the Direction-of-Arrival (DoA) from 1 • to 360 • at an interval of 5 • . To facilitate the study of noise robustness, 6 types of outdoor noise are recorded at 4 DoAs, using the same devices. Moreover, we propose a baseline method, namely Sound Localization and Classification Network (SLCnet) and present the experimental results and analysis conducted on the collected SLoClas database. We achieve the accuracy of 95.21% and 80.01% for sound localization and classification, respectively. We publicly release this database and the source code for research purpose.

show abstract

Section: Post-processingmentioning

confidence: 99%

SLoClas: A Database for Joint Sound Localization and Classification

Qian¹,

Sharma²,

Abridi³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…Other approaches are based on Hidden Markov Model (HMM) (Daniel and James, 2017) such as (Lefevre et al, 2002) that combines a K-Means classifier with Hidden Markov Models in order to analyze audio segment using several audio features based either on segment or frame. Another method base on HMM is (Biswajit et al, 2015) that aims at exploring Vowel Onset Point (VOP) and Vowel offset or End Point (VEP) for correcting the boundaries obtained using HMM alignment. HMM models the class information well, but it may not detect the exact boundary.…”

Section: Word (N)mentioning

confidence: 99%

Speech Segmentation Using Dynamic Windows and Thresholds for Arabic and English Languages

Jazyah¹

2018

Journal of Computer Science

View full text Add to dashboard Cite

Segmentation of audio data such as human speech (splitting each word in separate audio file-.WAV file) has been a major concern when working with multimedia such as recordings from radio or TV. The main focus of the segmentation of boundaries of spoken language has been on using energy and zero crossing thresholds for endpoint detection. Errors in endpoint detection are still a main cause of low accuracy of segmentation systems. The goal of this research is to develop an efficient algorithm in order to segment the speech of human in both languages of English and Arabic in different speaking speed with high accuracy. Simulation results show that the developed algorithm achieved high accuracy when segmenting human speech in English language up to 91.6% in average, while it is 89.0% of Arabic language.

show abstract

“…In literature, sonorant segmentation is performed by using mel frequency cepstral coefficients (MFCCs), knowledge based acoustic features or a combination of both [2], [24]. Recently in [23], [25], features based on both spectral and source information are proposed and a hierarchical algorithm is developed to detect sonorant and non-sonorant regions in continuous speech. However, the feature may not have potential to further divide the sonorant regions based on the degree of sonority associated with the sound.…”

Section: B Usefulness Of Sonority Featurementioning

confidence: 99%

Sonority Measurement Using System, Source, and Suprasegmental Information

Sharma

Prasanna

2017

IEEE/ACM Trans. Audio Speech Lang. Process.

Self Cite

View full text Add to dashboard Cite

Sonorant sounds are characterized by regions with prominent formant structure, high energy and high degree of periodicity. In this work, the vocal-tract system, excitation source and suprasegmental features derived from the speech signal are analyzed to measure the sonority information present in each of them. Vocal-tract system information is extracted from the Hilbert envelope of numerator of group delay function. It is derived from zero time windowed speech signal that provides better resolution of the formants. A five-dimensional feature set is computed from the estimated formants to measure the prominence of the spectral peaks. A feature representing strength of excitation is derived from the Hilbert envelope of linear prediction residual, which represents the source information. Correlation of speech over ten consecutive pitch periods is used as the suprasegmental feature representing periodicity information. The combination of evidences from the three different aspects of speech provides better discrimination among different sonorant classes, compared to the baseline MFCC features. The usefulness of the proposed sonority feature is demonstrated in the tasks of phoneme recognition and sonorant classification.

show abstract

Exploration of vowel onset and offset points for hybrid speech segmentation

Cited by 5 publications

References 11 publications

SLoClas: A Database for Joint Sound Localization and Classification

SLoClas: A Database for Joint Sound Localization and Classification

Speech Segmentation Using Dynamic Windows and Thresholds for Arabic and English Languages

Sonority Measurement Using System, Source, and Suprasegmental Information

Contact Info

Product

Resources

About