2014
DOI: 10.1186/s13636-014-0034-5
|View full text |Cite
|
Sign up to set email alerts
|

Audio segmentation-by-classification approach based on factor analysis in broadcast news domain

Abstract: This paper studies a novel audio segmentation-by-classification approach based on factor analysis. The proposed technique compensates the within-class variability by using class-dependent factor loading matrices and obtains the scores by computing the log-likelihood ratio for the class model to a non-class model over fixed-length windows. Afterwards, these scores are smoothed to yield longer contiguous segments of the same class by means of different back-end systems. Unlike previous solutions, our proposal do… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
13
0

Year Published

2015
2015
2022
2022

Publication Types

Select...
5
3

Relationship

2
6

Authors

Journals

citations
Cited by 20 publications
(13 citation statements)
references
References 39 publications
(49 reference statements)
0
13
0
Order By: Relevance
“…The winner team of the original Albayzín 2010 evaluation proposed a segmentation by classification approach based on a hierarchical GMM/HMM (dark blue) including MFCCs, chroma and spectral entropy as input feature [65]. The best result so far in this database was obtained with a solution based on factor analysis combined with a Gaussian backend (orange) and MFCCs with 1st and 2nd order derivatives as input features [17]. Our three previously explained final results combining the RNN classifier and the HMM resegmentation are also presented: the RNN baseline (purple), the BLSTM 1 PoolBLSTM 2 RNN approach (green) and the BLSTM 1 PoolBLSTM 2 RNN trained using mixup augmentation (light blue).…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…The winner team of the original Albayzín 2010 evaluation proposed a segmentation by classification approach based on a hierarchical GMM/HMM (dark blue) including MFCCs, chroma and spectral entropy as input feature [65]. The best result so far in this database was obtained with a solution based on factor analysis combined with a Gaussian backend (orange) and MFCCs with 1st and 2nd order derivatives as input features [17]. Our three previously explained final results combining the RNN classifier and the HMM resegmentation are also presented: the RNN baseline (purple), the BLSTM 1 PoolBLSTM 2 RNN approach (green) and the BLSTM 1 PoolBLSTM 2 RNN trained using mixup augmentation (light blue).…”
Section: Discussionmentioning
confidence: 99%
“…Multistage decision trees are used in [16] with the same objective of discriminating speech and music. The factor analysis (FA) technique, usually applied in speaker verification, is adapted to audio segmentation domain by Castán et al in [17] obtaining relevant results for broadcast domain data.…”
Section: Audio Segmentation Approachesmentioning
confidence: 99%
“…al. in [9]. Such approach based on classifying consecutive audio frames, where the segmentation is performed by an analysis of the sequence of decisions.…”
Section: Related Workmentioning
confidence: 99%
“…This paper describes the database and the evaluation process and summarizes the results obtained. in Spanish [8][9][10][11], and more recently, the Multi-Genre Broadcast (MGB) Challenge with data in English and Arabic 2 [12][13][14]. In other areas apart from broadcast speech, several evaluation campaigns have been proposed such as the ones organized in the scope of the Zero Resource Speech Challenge [15,16], the TC-STAR evaluation on recordings of the European Parliament's sessions in English and Spanish [5], or the MediaEval evaluation of multimodal search and hyperlinking [17].As a way to measure the performance of different techniques and approaches, in this 2018 edition, the IberSpeech-RTVE Challenge Evaluation campaign was proposed in three different conditions: speech-to-text transcription (STT), speaker diarization (SD), and multimodal diarization (MD).…”
mentioning
confidence: 99%
“…For the evaluation, three television programs were distributed, one from "La Mañana" and two from "La Tarde en 24H Tertulia", which totaled four hours. For enrollment, photos (10) and video (20 s) of the 39 characters to be labeled were provided.…”
mentioning
confidence: 99%