Polynomial Segment Model (PSM) has opened up an alternative research direction for acoustic modeling. In our previous papers [1, 2], we proposed efficient incremental likelihood evaluation and EM training algorithms for PSM, making it possible to train and recognize using PSM alone. In this paper, we shift our focus to make it feasible to use PSM on large vocabulary recognition. First, we used sub-phonetic PSM that represents a phoneme as multiple independent segmental units. Second, we derived and compared different top-down mixture growing approaches that are orders of magnitude more efficient than previously proposed agglomerative clustering techniques. Experimental results show that the top-down clustering performs better than the bottom-up approach. Recognition via N-best re-scoring shows that PSM models outperformed HMM by 7% to 19% on the 5k closed vocabulary Wall Street Journal Nov 92 testset. Our best PSM model achieve 7.15% WER compare with 7.81% use 16 mixture HMM model.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.