The quality of concatenative speech synthesis depends on the cost function employed for unit selection. Effective cost functions for spectral continuity have proven difficult to define and standard measures do not accurately reflect human perception of spectral discontinuity in concatenated speech. Previous studies on spectral join costs have focused predominantly on static spectral measures extracted from the unit boundary. In this paper spectral dynamic behaviour is investigated as a source of discontinuity in concatenated speech. A number of measures representing spectral dynamics are tested for the task of detecting discontinuities. The spectral dynamic measures tested contain information correlating with human perception of discontinuities, suggesting that spectral dynamics are a source of discontinuity in concatenated speech. A strategy to effectively combine dynamic and static measures is proposed using principal component analysis (PCA).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.