Hung-Shin Lee scite author profile

Hung-Shin Lee

5Publications

41Citation Statements Received

70Citation Statements Given

How they've been cited

How they cite others

Affiliations

Institute of Information Science, Academia Sinica, National Taiwan University, Academia Sinica

Publications

Order By: Most citations

Speaker verification using kernel-based binary classifiers with binary operation derived features

Lee

Tso

Chang

et al. 2014

View full text Add to dashboard Cite

In this paper, we study the use of two kinds of kernel-based discriminative models, namely support vector machine (SVM) and deep neural network (DNN), for speaker verification. We treat the verification task as a binary classification problem, in which a pair of two utterances, each represented by an i-vector, is assumed to belong to either the "within-speaker" group or the "betweenspeaker" group. To solve the problem, we employ various binary operations to retain the basic relationship between any pair of ivectors to form a single vector for training the discriminative models. This study also investigates the correlation of achievable performances with the number of training pairs and the various combinations of basic binary operations, using the SVM and DNN binary classifiers. The experiments are conducted on the male portion of the core task in the NIST 2005 Speaker Recognition Evaluation (SRE), and the results are competitive or even better, in terms of normalized decision cost function (minDCF) and equal error rate (EER), while compared to other non-probabilistic based models, such as the conventional speaker SVMs and the LDA-based cosine distance scoring.

show abstract

Melody Harmonization Using Orderless Nade, Chord Balancing, and Blocked Gibbs Sampling

Sun

Chen

Lee

et al. 2021

View full text Add to dashboard Cite

Coherence and interestingness are two criteria for evaluating the performance of melody harmonization, which aims to generate a chord progression from a symbolic melody. In this study, we apply the concept of orderless NADE, which takes the melody and its partially masked chord sequence as the input of the BiLSTM-based networks to learn the masked ground truth, to the training process. In addition, the class weights are used to compensate for some reasonable chord labels that are rarely seen in the training set. Consistent with the stochasticity in training, blocked Gibbs sampling with proper numbers of masking/generating loops is used in the inference phase to progressively trade the coherence of the generated chord sequence off against its interestingness. The experiments were conducted on a dataset of 18,005 melody/chord pairs. Our proposed model outperforms the state-of-the-art system MTHarmonizer in five of six different objective metrics based on chord/melody harmonicity and chord progression. The subjective test results with more than 100 participants also show the superiority of our model.

show abstract

Discriminative autoencoders for speaker verification

Lee

Hsu

et al. 2017

View full text Add to dashboard Cite

Subspace-based phonotactic language recognition using multivariate dynamic linear models

Lee

Shih

Wang

et al. 2013

View full text Add to dashboard Cite

Phonotactics, dealing with permissible phone patterns and their frequencies of occurrence in a specific language, is acknowledged to be related to spoken language recognition (SLR). With the assistance of phone recognizers, each speech utterance can be decoded into an ordered sequence of phone vectors filled with likelihood scores contributed by all possible phone models. In this paper, we propose a novel approach to dig the concealed phonotactic structure out of the phone-likelihood vectors through a kind of multivariate time series analysis: dynamic linear models (DLM). In these models, treating the generation of phone patterns in each utterance as a dynamic system, the relationship between adjacent vectors is linearly and time-invariantly modeled, and unobserved states are introduced to capture a temporal coherence intrinsic in the system. Each utterance expressed by the DLM is further transformed into a fixed-dimensional linear subspace so that well-developed distance measures between two subspaces can be applied to linear discriminant analysis (LDA) in a dissimilaritybased fashion. The results of SLR experiments on the OGI-TS corpus demonstrate that the proposed framework outperforms the well-known vector space modeling (VSM)-based methods and achieves comparable performance to our previous subspace-based method.

show abstract

Subspace-based feature representation and learning for language recognition

Shih

Lee

Wang

et al. 2012

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Hung-Shin Lee

Speaker verification using kernel-based binary classifiers with binary operation derived features

Melody Harmonization Using Orderless Nade, Chord Balancing, and Blocked Gibbs Sampling

Discriminative autoencoders for speaker verification

Subspace-based phonotactic language recognition using multivariate dynamic linear models

Subspace-based feature representation and learning for language recognition

Contact Info

Product

Resources

About