Recent advances in technology have led to the availability of powerful speech recognizers at low cost and to the possibility of using speech interaction in a variety of new and exciting practical applications.The purpose of this research was to investigate and develop the use of speech recognition in live television subtitling. This paper describes how the "SpeakTitle" project met the challenges of real time speech recognition and live subtitling through the development of a customisable speaker interface and use of "Topics" for specific subject domains. In the prototype system (described in Hewitt et. al. 2000 andBateman et. al. 2001) output from the speech recognition system (the IBM ViaVoice ® engine) is passed in to a custom-built editor from where it can be corrected and passed on to an existing subtitling system. The system was developed to the extent that it was acceptable for the production of subtitles for live television broadcasts and it has been adopted by three subtitle production facilities in the UK.The evolution of the product and the experiences of users in developing the system in a live subtitling environment are considered, and the system is analysed against industry standards. Ease-of-use and accuracy are also discussed and further research areas are identified.
The advent of humanoid robots has enabled a new approach to investigating the acquisition of language, and we report on the development of robots able to acquire rudimentary linguistic skills. Our work focuses on early stages analogous to some characteristics of a human child of about 6 to 14 months, the transition from babbling to first word forms. We investigate one mechanism among many that may contribute to this process, a key factor being the sensitivity of learners to the statistical distribution of linguistic elements. As well as being necessary for learning word meanings, the acquisition of anchor word forms facilitates the segmentation of an acoustic stream through other mechanisms. In our experiments some salient one-syllable word forms are learnt by a humanoid robot in real-time interactions with naive participants. Words emerge from random syllabic babble through a learning process based on a dialogue between the robot and the human participant, whose speech is perceived by the robot as a stream of phonemes. Numerous ways of representing the speech as syllabic segments are possible. Furthermore, the pronunciation of many words in spontaneous speech is variable. However, in line with research elsewhere, we observe that salient content words are more likely than function words to have consistent canonical representations; thus their relative frequency increases, as does their influence on the learner. Variable pronunciation may contribute to early word form acquisition. The importance of contingent interaction in real-time between teacher and learner is reflected by a reinforcement process, with variable success. The examination of individual cases may be more informative than group results. Nevertheless, word forms are usually produced by the robot after a few minutes of dialogue, employing a simple, real-time, frequency dependent mechanism. This work shows the potential of human-robot interaction systems in studies of the dynamics of early language acquisition.
This article presents results from a multidisciplinary research project on the integration and transfer of language knowledge into robots as an empirical paradigm for the study of language development in both humans and humanoid robots. Within the framework of human linguistic and cognitive development, we focus on how three central types of learning interact and co-develop: individual learning about one's own embodiment and the environment, social learning (learning from others), and learning of linguistic capability. Our primary concern is how these capabilities can scaffold each other's development in a continuous feedback cycle as their interactions yield increasingly sophisticated competencies in the agent's capacity to interact with others and manipulate its world. Experimental results are summarized in relation to milestones in human linguistic and cognitive development and show that the mutual scaffolding of social learning, individual learning, and linguistic capabilities creates the context, conditions, and requisites for learning in each domain. Challenges and insights identified as a result of this research program are discussed with regard to possible and actual contributions to cognitive science and language ontogeny. In conclusion, directions for future work are suggested that continue to develop this approach toward an integrated framework for understanding these mutually scaffolding processes as a basis for language development in humans and robots.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.