We describe a new approach to speech recognition, in which all Hidden Markov Model (HMM) states share the same Gaussian Mixture Model (GMM) structure with the same number of Gaussians in each state. The model is defined by vectors associated with each state with a dimension of, say, 50, together with a global mapping from this vector space to the space of parameters of the GMM. This model appears to give better results than a conventional model, and the extra structure offers many new opportunities for modeling innovations while maintaining compatibility with most standard techniques.
Although research has previously been done on multilingual speech recognition, it has been found to be very difficult to improve over separately trained systems. The usual approach has been to use some kind of "universal phone set" that covers multiple languages. We report experiments on a different approach to multilingual speech recognition, in which the phone sets are entirely distinct but the model has parameters not tied to specific states that are shared across languages. We use a model called a "Subspace Gaussian Mixture Model" where states' distributions are Gaussian Mixture Models with a common structure, constrained to lie in a subspace of the total parameter space. The parameters that define this subspace can be shared across languages. We obtain substantial WER improvements with this approach, especially with very small amounts of inlanguage training data.
Abstract-The growing requirements for broadcasting and streaming of high quality video continue to trigger demands for codecs with higher compression efficiency. AV1 is the most recent open and royalty free video coding specification developed by Alliance for Open Media (AOMedia) with a declared ambition of becoming the most popular next generation video coding standard. Primary alternatives to AV1 are the VP9 and the HEVC/H.265 which are currently among the most popular and widespread video codecs used in applications. VP9 is also a royalty free and open specification similar to AV1, while HEVC/H.265 requires specific licensing terms for its use in commercial products and services. In this paper, we compare AV1 to VP9 and HEVC/H.265 from rate distortion point of view in a broadcasting use case scenario. Mutual comparison is performed by means of subjective evaluations carried out in a controlled environment using HD video content with typical bitrates ranging from low to high, corresponding to very low up to completely transparent quality. We then proceed with an in-depth analysis of advantages and drawbacks of each codec for specific types of content and compare the subjective comparisons and conclusions to those obtained by others in the state of the art as well to those measured by means of objective metrics such as PSNR.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.