In this work, we propose a hierarchical subspace model for acoustic unit discovery. In this approach, we frame the task as one of learning embeddings on a low-dimensional phonetic subspace, and simultaneously specify the subspace itself as an embedding on a hyper-subspace. We train the hyper-subspace on a set of transcribed languages and transfer it to the target language. In the target language, we infer both the language and unit embeddings in an unsupervised manner, and in so doing, we simultaneously learn a subspace of units specific to that language and the units that dwell on it. We conduct experiments on TIMIT and two low-resource languages: Mboshi and Yoruba. Results show that our model outperforms major acoustic unit discovery techniques, both in terms of clustering quality and segmentation accuracy.
State of the art vocabulary-independent spoken term detection methods are typically based on variants of the dynamic time warping (DTW) algorithm since DTW, being based on acoustic sequence matching, allows robust retrieval in settings with scarcity of linguistic resources. However, the DTW comes with a high computational cost which limits its practicality in a deployed server. To this end, we investigate the efficacy of subsampling and propose a neural network architecture to reduce the computational load of DTW-based keyword search. We use a time-subsampled RNN to reduce the frame rate of the document as well as the dimensionality of representation while training it to maintain the cost incurred along the DTW alignment path, thus allowing us to reduce the computational complexity (both space and time) of the search algorithm. Experiments on the Turkish and Zulu limited language packs of the IARPA Babel program show that the proposed methods allow considerable reduction in CPU time (88 times) and memory usage (18 times) without significant loss in search accuracy (0.0270 ATWV). Moreover, even at very high compression levels with lower search precision, high recall rates are maintained, allowing the potential of multi-resolution search.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.