Abstract-This paper investigates the problem of speaker identification and verification in noisy conditions, assuming that speech signals are corrupted by environmental noise, but knowledge about the noise characteristics is not available. This research is motivated in part by the potential application of speaker recognition technologies on handheld devices or the Internet. While the technologies promise an additional biometric layer of security to protect the user, the practical implementation of such systems faces many challenges. One of these is environmental noise. Due to the mobile nature of such systems, the noise sources can be highly time-varying and potentially unknown. This raises the requirement for noise robustness in the absence of information about the noise. This paper describes a method that combines multicondition model training and missing-feature theory to model noise with unknown temporal-spectral characteristics. Multicondition training is conducted using simulated noisy data with limited noise variation, providing a "coarse" compensation for the noise, and missing-feature theory is applied to refine the compensation by ignoring noise variation outside the given training conditions, thereby reducing the training and testing mismatch. This paper is focused on several issues relating to the implementation of the new model for real-world applications. These include the generation of multicondition training data to model noisy speech, the combination of different training data to optimize the recognition performance, and the reduction of the model's complexity. The new algorithm was tested using two databases with simulated and realistic noisy speech data. The first database is a redevelopment of the TIMIT database by rerecording the data in the presence of various noise types, used to test the model for speaker identification with a focus on the varieties of noise. The second database is a handheld-device database collected in realistic noisy conditions, used to further validate the model for real-world speaker verification. The new model is compared to baseline systems and is found to achieve lower error rates.
Community Question Answering (cQA) provides new interesting research directions to the traditional Question Answering (QA) field, e.g., the exploitation of the interaction between users and the structure of related posts. In this context, we organized SemEval-2015 Task 3 on Answer Selection in cQA, which included two subtasks: (a) classifying answers as good, bad, or potentially relevant with respect to the question, and (b) answering a YES/NO question with yes, no, or unsure, based on the list of all answers. We set subtask A for Arabic and English on two relatively different cQA domains, i.e., the Qatar Living website for English, and a Quran-related website for Arabic. We used crowdsourcing on Amazon Mechanical Turk to label a large English training dataset, which we released to the research community. Thirteen teams participated in the challenge with a total of 61 submissions: 24 primary and 37 contrastive. The best systems achieved an official score (macro-averaged F 1 ) of 57.19 and 63.7 for the English subtasks A and B, and 78.55 for the Arabic subtask A. 1
Query understanding has been well studied in the areas of information retrieval and spoken language understanding (SLU). There are generally three layers of query understanding: domain classification, user intent detection, and semantic tagging. Classifiers can be applied to domain and intent detection in real systems, and semantic tagging (or slot filling) is commonly defined as a sequence-labeling task --mapping a sequence of words to a sequence of labels. Various statistical features (e.g., n-grams) can be extracted from annotated queries for learning label prediction models; however, linguistic characteristics of queries, such as hierarchical structures and semantic relationships, are usually neglected in the feature extraction process. In this work, we propose an approach that leverages linguistic knowledge encoded in hierarchical parse trees for query understanding. Specifically, for natural language queries, we extract a set of syntactic structural features and semantic dependency features from query parse trees to enhance inference model learning. Experiments on real natural language queries show that augmenting sequence labeling models with linguistic knowledge can improve query understanding performance in various domains.
Spoken dialogue systems have been studied for years, yet portability is still one of the biggest challenges in terms of language extensibility, domain scalability, and platform compatibility. In this work, we investigate the portability issue from the language understanding perspective and present the Asgard architecture, a CRF-based (Conditional Random Fields) and crowd-sourcing-centered framework, which supports expert-free development of multilingual dialogue systems and seamless deployment to mobile platforms. Combinations of linguistic and statistical features are employed for multilingual semantic understanding, such as n-grams, tokenization and part-of-speech. English and Mandarin systems in various domains (movie, flight and restaurant) are implemented with the proposed framework and ported to mobile platforms as well, which sheds lights on large-scale speech App development.
Abstract-One long-standing challenge in robotics is the realization of mobile autonomous robots able to operate safely in existing human workplaces in a way that their presence is accepted by the human occupants. We describe the development of a multi-ton robotic forklift intended to operate alongside human personnel, handling palletized materials within existing, busy, semi-structured outdoor storage facilities.The system has three principal novel characteristics. The first is a multimodal tablet that enables human supervisors to use speech and pen-based gestures to assign tasks to the forklift, including manipulation, transport, and placement of palletized cargo. Second, the robot operates in minimally-prepared, semistructured environments, in which the forklift handles variable palletized cargo using only local sensing (and no reliance on GPS), and transports it while interacting with other moving vehicles. Third, the robot operates in close proximity to people, including its human supervisor, other pedestrians who may cross or block its path, and forklift operators who may climb inside the robot and operate it manually. This is made possible by novel interaction mechanisms that facilitate safe, effective operation around people.We describe the architecture and implementation of the system, indicating how real-world operational requirements motivated the development of the key subsystems, and provide qualitative and quantitative descriptions of the robot operating in real settings.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.