Phonotactics, dealing with permissible phone patterns and their frequencies of occurrence in a specific language, is acknowledged to be related to spoken language recognition (SLR). With the assistance of phone recognizers, each speech utterance can be decoded into an ordered sequence of phone vectors filled with likelihood scores contributed by all possible phone models. In this paper, we propose a novel approach to dig the concealed phonotactic structure out of the phone-likelihood vectors through a kind of multivariate time series analysis: dynamic linear models (DLM). In these models, treating the generation of phone patterns in each utterance as a dynamic system, the relationship between adjacent vectors is linearly and time-invariantly modeled, and unobserved states are introduced to capture a temporal coherence intrinsic in the system. Each utterance expressed by the DLM is further transformed into a fixed-dimensional linear subspace so that well-developed distance measures between two subspaces can be applied to linear discriminant analysis (LDA) in a dissimilaritybased fashion. The results of SLR experiments on the OGI-TS corpus demonstrate that the proposed framework outperforms the well-known vector space modeling (VSM)-based methods and achieves comparable performance to our previous subspace-based method.
This paper presents a novel content-based query-by-tag music search system for an untagged music database. We design a new tag query interface that allows users to input multiple tags with multiple levels of preference (denoted as an MTML query) by colorizing desired tags in a web-based tag cloud interface. When a user clicks and holds the left mouse button (or presses and holds his/her finger on a touch screen) on a desired tag, the color of the tag will change cyclically according to a color map (from dark blue to bright red), which represents the level of preference (from 0 to 1). In this way, the user can easily organize and check the query of multiple tags with multiple levels of preference through the colored tags. To effect the MTML content-based music retrieval, we introduce a probabilistic fusion model (denoted as GMFM), which consists of two mixture models, namely a Gaussian mixture model and a multinomial mixture model. GMFM can jointly model the auditory features and tag labels of a song. Two indexing methods and their corresponding matching methods, namely pseudo song-based matching and tag affinity-based matching, are incorporated into the pre-learned GMFM. We evaluate the proposed system on the MajorMiner and CAL-500 datasets. The experimental results demonstrate the effectiveness of GMFM and the potential of using MTML queries to search music from an untagged music database.Tag cloud-based music query interface, MTML query, contentbased music information retrieval, probabilistic fusion model.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.