Pinar Akyazi scite author profile

We describe a new approach to speech recognition, in which all Hidden Markov Model (HMM) states share the same Gaussian Mixture Model (GMM) structure with the same number of Gaussians in each state. The model is defined by vectors associated with each state with a dimension of, say, 50, together with a global mapping from this vector space to the space of parameters of the GMM. This model appears to give better results than a conventional model, and the extra structure offers many new opportunities for modeling innovations while maintaining compatibility with most standard techniques.

show abstract

Subspace Gaussian Mixture Models for speech recognition

Povey

et al. 2010

View full text Add to dashboard Cite

Multilingual acoustic modeling for speech recognition based on subspace Gaussian Mixture Models

et al. 2010

View full text Add to dashboard Cite

Although research has previously been done on multilingual speech recognition, it has been found to be very difficult to improve over separately trained systems. The usual approach has been to use some kind of "universal phone set" that covers multiple languages. We report experiments on a different approach to multilingual speech recognition, in which the phone sets are entirely distinct but the model has parameters not tied to specific states that are shared across languages. We use a model called a "Subspace Gaussian Mixture Model" where states' distributions are Gaussian Mixture Models with a common structure, constrained to lie in a subspace of the total parameter space. The parameters that define this subspace can be shared across languages. We obtain substantial WER improvements with this approach, especially with very small amounts of inlanguage training data.

show abstract

Learning-based image coding: early solutions reviewing and subjective quality evaluation

Ascenso¹,

Akyazi

Pereira³

et al. 2020

View full text Add to dashboard Cite

Comparison of Compression Efficiency between HEVC/H.265, VP9 and AV1 based on Subjective Quality Assessments

Akyazi

Ebrahimi

2018

View full text Add to dashboard Cite

Abstract-The growing requirements for broadcasting and streaming of high quality video continue to trigger demands for codecs with higher compression efficiency. AV1 is the most recent open and royalty free video coding specification developed by Alliance for Open Media (AOMedia) with a declared ambition of becoming the most popular next generation video coding standard. Primary alternatives to AV1 are the VP9 and the HEVC/H.265 which are currently among the most popular and widespread video codecs used in applications. VP9 is also a royalty free and open specification similar to AV1, while HEVC/H.265 requires specific licensing terms for its use in commercial products and services. In this paper, we compare AV1 to VP9 and HEVC/H.265 from rate distortion point of view in a broadcasting use case scenario. Mutual comparison is performed by means of subjective evaluations carried out in a controlled environment using HD video content with typical bitrates ranging from low to high, corresponding to very low up to completely transparent quality. We then proceed with an in-depth analysis of advantages and drawbacks of each codec for specific types of content and compare the subjective comparisons and conclusions to those obtained by others in the state of the art as well to those measured by means of objective metrics such as PSNR.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Pinar Akyazi

The subspace Gaussian mixture model—A structured model for speech recognition

Subspace Gaussian Mixture Models for speech recognition

Multilingual acoustic modeling for speech recognition based on subspace Gaussian Mixture Models

Learning-based image coding: early solutions reviewing and subjective quality evaluation

Comparison of Compression Efficiency between HEVC/H.265, VP9 and AV1 based on Subjective Quality Assessments

Contact Info

Product

Resources

About