We present an overview of the data collection and transcription efforts for the COnversational Speech In Noisy Environments (CO-SINE) corpus. The corpus is a set of multi-party conversations recorded in real world environments with background noise that can be used to train noise-robust speech recognition systems. We explain the motivation for creating such a corpus and describe the resulting audio recordings and transcriptions that comprise the corpus. These recordings include a 4-channel array and close-talking, far-field, and throat microphones on separate synchronized channels, allowing for unique algorithm research.
We propose a method for finding keywords in an audio database using a spoken query. Our method is based on performing a joint alignment between a phone lattice generated from a spoken utterance query and a second phone lattice representing a long utterance needing to be searched. We implement this joint alignment procedure in a graphical models framework. We evaluate our system on TIMIT as well as on the Switchboard conversational telephone speech (CTS) corpus. Our results show that a phone lattice representation of the spoken query achieves higher performance than using only the 1-best phone sequence representation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.