Ho Yin Chan scite author profile

Abstract-Broadcast News (BN) transcription has been a challenging research area for many years. In the last couple of years the availability of large amounts of roughly transcribed acoustic training data and advanced model training techniques has offered the opportunity to greatly reduce the error rate on this task. This paper describes the design and performance of BN transcription systems which make use of these developments. First the effects of using lightly-supervised training data and advanced acoustic modelling techniques are discussed. The design of a real-time broadcast news recognition system is then detailed using these new models. As system combination has been found to yield large gains in performance, a range of frameworks that allow multiple recognition outputs to be combined are next described. These include the use of multiple types of acoustic models and multiple segmentations. As a contrast a system developed by multiple sites allowing cross-site combination, the "SuperEARS" system, is also described. The various models and recognition configurations are evaluated using several recent BN development and evaluation test sets. These new BN transcription systems can give gains of over 25% relative to the CU-HTK 2003 BN system.

show abstract

Improving broadcast news transcription by lightly supervised discriminative training

Chan

Woodland

View full text Add to dashboard Cite

In this paper, we present our experiments on lightly supervised discriminative training with large amounts of broadcast news data for which only closed caption transcriptions are available (TDT data). In particular, we use language models biased to the closedcaption transcripts to recognise the audio data, and the recognised transcripts are then used as the training transcriptions for acoustic model training. A range of experiments that use maximum likelihood (ML) training as well as discriminative training based on either maximum mutual information (MMI) or minimum phone error (MPE) are presented. In a 5xRT broadcast news transcription system that includes adaptation, it is shown that reductions in word error rate (WER) in the range of 1% absolute can be achieved. Finally, some experiments on training data selection are presented to compare different methods of "filtering" the transcripts.

show abstract

Training LVCSR Systems on Thousands of Hours of Data

Evermann

Chan

Gales

et al.

View full text Add to dashboard Cite

Typical systems for large vocabulary conversational speech recognition (LVCSR) have been trained on a few hundred hours of carefully transcribed acoustic training data. This paper describes an LVCSR system for the conversational telephone speech (CTS) task trained on more than 2000 hours of data for which only approximate transcriptions were available. The challenges of dealing which such a large data set and the accuracy improvements over the small baseline system are discussed. The effect on both acoustic and language modelling performance is studied. Overall increasing the training data size from 360h to 2200h and optimising the training procedure reduced the word error rate on the DARPA/NIST 2003 eval set by about 20% relative.

show abstract

Development of the 2003 CU-HTK conversational telephone speech transcription system

Evermann

Chan

Gales

et al.

View full text Add to dashboard Cite

This paper describes the development of the 2003 CU-HTK large vocabulary speech recognition system for Conversational Telephone Speech (CTS). The system was designed based on a multipass, multi-branch structure where the output of all branches is combined using system combination. A number of advanced modelling techniques such as Speaker Adaptive Training, Heteroscedastic Linear Discriminant Analysis, Minimum Phone Error estimation and specially constructed Single Pronunciation dictionaries were employed. The effectiveness of each of these techniques and their potential contribution to the result of system combination was evaluated in the framework of a state-of-the-art LVCSR system with sophisticated adaptation. The final 2003 CU-HTK CTS system constructed from some of these models is described and its performance on the DARPA/NIST 2003 Rich Transcription (RT-03) evaluation test set is discussed.

show abstract

Zara The Supergirl: An Empathetic Personality Recognition System

Fung¹,

Dey²,

Siddique³

et al. 2016

View full text Add to dashboard Cite

Zara the Supergirl is an interactive system that, while having a conversation with a user, uses its built in sentiment analysis, emotion recognition, facial and speech recognition modules, to exhibit the human-like response of sharing emotions. In addition, at the end of a 5-10 minute conversation with the user, it can give a comprehensive personality analysis based on the user's interaction with Zara. This is a first prototype that has incorporated a full empathy module, the recognition and response of human emotions, into a spoken language interactive system that enhances human-robot understanding. Zara was shown at the World Economic Forum in Dalian in

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Ho Yin Chan

Progress in the CU-HTK broadcast news transcription system

Improving broadcast news transcription by lightly supervised discriminative training

Training LVCSR Systems on Thousands of Hours of Data

Development of the 2003 CU-HTK conversational telephone speech transcription system

Zara The Supergirl: An Empathetic Personality Recognition System

Contact Info

Product

Resources

About