Audio segmentation for speech recognition using segment features

Rybach, David; Gollan, Christian; Schlüter, Ralf; Ney, Hermann

doi:10.1109/icassp.2009.4960554

Cited by 41 publications

(25 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Change point detection methods are applied here for audio segmentation and recognizing boundaries between silence, sentences, words, and noise [13][14]. …”

Section: Introductionmentioning

confidence: 99%

A survey of methods for time series change point detection

2016

View full text Add to dashboard Cite

Change points are abrupt variations in time series data. Such abrupt changes may represent transitions that occur between states. Detection of change points is useful in modelling and prediction of time series and is found in application areas such as medical condition monitoring, climate change detection, speech and image analysis, and human activity analysis. This survey article enumerates, categorizes, and compares many of the methods that have been proposed to detect change points in time series. The methods examined include both supervised and unsupervised algorithms that have been introduced and evaluated. We introduce several criteria to compare the algorithms. Finally, we present some grand challenges for the community to consider.

show abstract

“…Change point detection methods are applied here for audio segmentation and recognizing boundaries between silence, sentences, words, and noise [13][14]. …”

Section: Introductionmentioning

confidence: 99%

A survey of methods for time series change point detection

2016

View full text Add to dashboard Cite

show abstract

“…We search through all possible locations and predict the one with the highest score. In this example the score is calculated for timing sequence is (1,4,6).…”

Section: Model Descriptionmentioning

confidence: 99%

“…Phoneme Boundary Detection or Phoneme Segmentation plays an essential first step for a variety of speech processing applications such as speaker diarization [1], speech science [2,3], keyword spotting [4], Automatic Speech Recognition [5,6], etc.…”

Section: Introductionmentioning

confidence: 99%

Phoneme Boundary Detection Using Learnable Segmental Features

Kreuk

Sheena

Keshet

et al. 2020

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Phoneme boundary detection plays an essential first step for a variety of speech processing applications such as speaker diarization, speech science, keyword spotting, etc. In this work, we propose a neural architecture coupled with a parameterized structured loss function to learn segmental representations for the task of phoneme boundary detection. First, we evaluated our model when the spoken phonemes were not given as input. Results on the TIMIT and Buckeye corpora suggest that the proposed model is superior to the baseline models and reaches state-of-the-art performance in terms of F1 and R-value. We further explore the use of phonetic transcription as additional supervision and show this yields minor improvements in performance but substantially better convergence rates. We additionally evaluate the model on a Hebrew corpus and demonstrate such phonetic supervision can be beneficial in a multi-lingual setting.

show abstract

“…Although audio analysis has been widely studied in scene classification [8,9,10], audio segmentation [11,12,13], and audio retrieval [14,15,16], to our knowledge, automatic audio tagging has not been much explored. Bertin-Mahieux et al [17] treated audio tag prediction as a set of binary classification problems and applied the Adaboost algorithm to the task.…”

Section: Introductionmentioning

confidence: 99%

Fast tagging of natural sounds using marginal co-regularization

Huang

Jackson

et al. 2017

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Automatic and fast tagging of natural sounds in audio collections is a very challenging task due to wide acoustic variations, the large number of possible tags, the incomplete and ambiguous tags provided by different labellers. To handle these problems, we use a co-regularization approach to learn a pair of classifiers on sound and text. The first classifier maps low-level audio features to a true tag list. The second classifier maps actively corrupted tags to the true tags, reducing incorrect mappings caused by low-level acoustic variations in the first classifier, and to augment the tags with additional relevant tags. Training the classifiers is implemented using marginal co-regularization, pair of which draws the two classifiers into agreement by a joint optimization. We evaluate this approach on two sound datasets, Freefield1010 and Task4 of DCASE2016. The results obtained show that marginal co-regularization outperforms the baseline GMM in both efficiency and effectiveness.

show abstract

Audio segmentation for speech recognition using segment features

Cited by 41 publications

References 7 publications

A survey of methods for time series change point detection

A survey of methods for time series change point detection

Phoneme Boundary Detection Using Learnable Segmental Features

Fast tagging of natural sounds using marginal co-regularization

Contact Info

Product

Resources

About