SpeechFind: advances in spoken document retrieval for a National Gallery of the Spoken Word

Hansen, John H. L.; Huang, Rongqing; Zhou, Bowen; Seadle, Michael; Deller, J.R.; Gurijala, Aparna; Kurimo, Mikko; Angkititrakul, Pongtep

doi:10.1109/tsa.2005.852088

Cited by 78 publications

(56 citation statements)

References 64 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…(see Fig. 6), and the inclusion in complete retrieval systems such as Rough 'n' Ready [55] and SpeechFind [56] allow users to see the current speaker information, understand the general flow of speakers throughout the broadcast, or search for a particular speaker within the audio. Experiments are also underway to ascertain if additional tasks, such as the process of annotating data, can be facilitated using diarization output.…”

Section: Discussionmentioning

confidence: 99%

An overview of automatic speaker diarization systems

Tranter

Reynolds

2006

IEEE Trans. Audio Speech Lang. Process.

507

300

View full text Add to dashboard Cite

Abstract-Audio diarization is the process of annotating an input audio channel with information that attributes (possibly overlapping) temporal regions of signal energy to their specific sources. These sources can include particular speakers, music, background noise sources, and other signal source/channel characteristics. Diarization can be used for helping speech recognition, facilitating the searching and indexing of audio archives, and increasing the richness of automatic transcriptions, making them more readable. In this paper, we provide an overview of the approaches currently used in a key area of audio diarization, namely speaker diarization, and discuss their relative merits and limitations. Performances using the different techniques are compared within the framework of the speaker diarization task in the DARPA EARS Rich Transcription evaluations. We also look at how the techniques are being introduced into real broadcast news systems and their portability to other domains and tasks such as meetings and speaker verification.

show abstract

Section: Discussionmentioning

confidence: 99%

An overview of automatic speaker diarization systems

Tranter

Reynolds

2006

IEEE Trans. Audio Speech Lang. Process.

507

300

View full text Add to dashboard Cite

show abstract

“…Previous experiment demonstrates that under-segmentation, caused by a high number of miss detections, is more cumbersome to remedy than over-segmentation caused by a high number of false alarms [12], [13], [15], [16], [23], [40]. For example, over-segmentation could be alleviated by clustering and/or merging.…”

Section: B Mathematical Properties Of the Ig Distribution And Its Apmentioning

confidence: 99%

“…The window size is also set equal to r taking into consideration as many data as possible. When more data are available, more accurate Gaussian models are built, since BIC behaves better for large windows, whereas short changes are not easily detectable by BIC [12], [16]. Moreover, it was shown in [22], that the bigger the window size, the better the performance.…”

Section: Bic-based Speaker Segmentationmentioning

confidence: 99%

Computationally Efficient and Robust BIC-Based Speaker Segmentation

Kotti

Benetos

Kotropoulos

2008

IEEE Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

An algorithm for automatic speaker segmentation based on the Bayesian Information Criterion (BIC) is presented. BIC tests are not performed for every window shift (e.g. every milliseconds), as previously, but when a speaker change is most probable to occur. This is done by estimating the next probable change point thanks to a model of utterance durations. It is found that the inverse Gaussian fits best the distribution of utterance durations. As a result, less BIC tests are needed, making the proposed system less computationally demanding in time and memory, and considerably more efficient with respect to missed speaker change points. A feature selection algorithm based on branch and bound search strategy is applied in order to identify the most efficient features for speaker segmentation. Furthermore, a new theoretical formulation of BIC is derived by applying centering and simultaneous diagonalization. This formulation is considerably more computationally efficient than the standard BIC, when the covariance matrices are estimated by other estimators than the usual maximum likelihood ones. Two commonly used pairs of figures of merit are employed and their relationship is established. Computational efficiency is achieved through the speaker utterance modeling, whereas robustness is achieved by feature selection and application of BIC tests at appropriately selected time instants. Experimental results indicate that the proposed modifications yield a superior performance compared to existing approaches.

show abstract

“…Having segmented speech regions, it is also often necessary to segment these further in terms of homogeneous speaker turns. In addition to improving ASR systems, speaker turn information can be helpful for speaker adaptation in rich transcription of videos and meetings (Bonastre et al, 2000) and for content based audio classification and retrieval (Hansen et al, 2005) which have a wide range of applications in the entertainment industry, audio archive management, surveillance, etc. Audio segmentation would also be an important tool in summarizing meetings, which has recently gained a lot of interest in the research community.…”

Section: Introductionmentioning

confidence: 99%