This work presents the development of a realtime framework for the research of Multimodal Feedback of Robots/Talking Agents in the context of Human Robot Interaction (HRI) and Human Computer Interaction (HCI). For evaluating the framework, a Multimodal corpus is built (EN-TERFACE STEAD), and a study on the important multimodal features was done for building an active Robot/Agent listener of a storytelling experience with Humans. The experiments show that even when building the same reactive behavior models for Robot and Talking Agents, the interpretation and the realization of the behavior communicated is different due to the different communicative channels Robots/Agents offer be it physical but less human-like in Robots, and virtual but more expressive and human-like in Talking agents.
The SCORPIO is a small-size mini-teleoperator mobile service robot for booby-trap disposal. It can be manually controlled by an operator through a portable briefcase remote control device using joystick, keyboard and buttons. In this paper, the speech interface is described. As an auxiliary function, the remote interface allows a human operator to concentrate sight and/or hands on other operation activities that are more important. The developed speech interface is based on HMM-based acoustic models trained using the SpeechDatE-SK database, a small-vocabulary language model based on fixed connected words, grammar, and the speech recognition setup adapted for low-resource devices. To improve the robustness of the speech interface in an outdoor environment, which is the working area of the SCORPIO service robot, a speech enhancement based on the spectral subtraction method, as well as a unique combination of an iterative approach and a modified LIMA framework, were researched, developed and tested on simulated and real outdoor recordings.
Large databases of scanned documents (medical records, legal texts, historical documents) require natural language processing for retrieval and structured information extraction. Errors caused by the optical character recognition (OCR) system increase ambiguity of recognized text and decrease performance of natural language processing. The paper proposes OCR post correction system with parametrized string distance metric. The correction system learns specific error patterns from incorrect words and common sequences of correct words. A smoothing technique is proposed to assign non-zero probability to edit operations not present in the training corpus. Spelling correction accuracy is measured on database of OCR legal documents in English language. Language model and learning string metric with smoothing improves Viterbi-based search for the best sequence of corrections and increases performance of the spelling correction system.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.