This paper investigates the effect of modeling sub-band correlation for noisy speech recognition. Sub-band data streams are assumed to be independent in many sub-band based speech recognition systems. However, the structure and operation of the human vocal tract suggests this assumption is unrealistic. A novel method is proposed to incorporate correlation into sub-band speech feature streams. In this method, all possible combinations of sub-bands are created and each combination is treated as a single frequency band by calculating a single feature vector for it. The resulting feature vectors capture information about every band in the combination as well as the dependency across the bands. Experiments conducted on the TIDigits database demonstrate significantly improved robustness in comparison to an independent sub-band system in the presence of both stationary and non-stationary noise.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.