Xiaosu Tong scite author profile

Xiaosu Tong

5Publications

27Citation Statements Received

68Citation Statements Given

How they've been cited

How they cite others

Affiliations

Amazon (United States), Purdue University West Lafayette, Amazon (Germany)

Publications

Order By: Most citations

Streaming ResLSTM with Causal Mean Aggregation for Device-Directed Utterance Detection

Tong

Huang

Mallidi

et al. 2021

View full text Add to dashboard Cite

In this paper, we propose a streaming model to distinguish voice queries intended for a smart-home device from background speech. The proposed model consists of multiple CNN layers with residual connections, followed by a stacked LSTM architecture. The streaming capability is achieved by using unidirectional LSTM layers and a causal mean aggregation layer to form the final utterance-level prediction up to the current frame. In order to avoid redundant computation during online streaming inference, we use a caching mechanism for every convolution operation. Experimental results on a device-directed vs. non device-directed task show that the proposed model yields an equal error rate reduction of 41% compared to our previous best model on this task. Furthermore, we show that the proposed model is able to accurately predict earlier in time compared to the attention-based models.

show abstract

Shear-thickening behavior of gelatinized waxy starch dispersions promoted by the starch molecular characteristics

Fang

Tunçil

Luo

et al. 2019

International Journal of Biological Macromolecules

View full text Add to dashboard Cite

Enhancing ASR for Stuttered Speech with Limited Data Using Detect and Pass

Shonibare¹,

Tong²,

Ravichandran³

2022

Preprint

View full text Add to dashboard Cite

It is estimated that around 70 million people worldwide are affected by a speech disorder called stuttering [1]. With recent advances in Automatic Speech Recognition (ASR), voice assistants are increasingly useful in our everyday lives. Many technologies in education, retail, telecommunication and healthcare can now be operated through voice. Unfortunately, these benefits are not accessible for People Who Stutter (PWS). We propose a simple but effective method called 'Detect and Pass' to make modern ASR systems accessible for People Who Stutter in a limited data setting. The algorithm uses a context aware classifier trained on a limited amount of data, to detect acoustic frames that contain stutter. To improve robustness on stuttered speech, this extra information is passed on to the ASR model to be utilized during inference. Our experiments show a reduction of 12.18% to 71.24% in Word Error Rate (WER) across various state of the art ASR systems. Upon varying the threshold of the associated posterior probability of stutter for each stacked frame used in determining low frame rate (LFR) acoustic features, we were able to determine an optimal setting that reduced the WER by 23.93% to 71.67% across different ASR systems.

show abstract

Gaussian Mixture Models for Classification and Hypothesis Tests Under Differential Privacy

Tong

Kantarcıoğlu³

et al. 2017

View full text Add to dashboard Cite

Many statistical models are constructed using very basic statistics: mean vectors, variances, and covariances. Gaussian mixture models are such models. When a data set contains sensitive information and cannot be directly released to users, such models can be easily constructed based on noise added query responses. The models nonetheless provide preliminary results to users. Although the queried basic statistics meet the differential privacy guarantee, the complex models constructed using these statistics may not meet the differential privacy guarantee. However it is up to the users to decide how to query a database and how to further utilize the queried results. In this article, our goal is to understand the impact of differential privacy mechanism on Gaussian mixture models. Our approach involves querying basic statistics from a database under differential privacy protection, and using the noise added responses to build classifier and perform hypothesis tests. We discover that adding Laplace noises may have a non-negligible effect on model outputs. For example variance-covariance matrix after noise addition is no longer positive definite. We propose a heuristic algorithm to repair the noise added variance-covariance matrix. We then examine the classification error using the noise added responses, through experiments with both simulated data and real life data, and demonstrate under which conditions the impact of the added noises can be reduced. We compute the exact type I and type II errors under differential privacy for one sample z test, one sample t test, and two sample t test with equal variances. We then show under which condition a hypothesis test returns reliable result given differentially private means, variances and covariances.

show abstract

Streaming ResLSTM with Causal Mean Aggregation for Device-Directed Utterance Detection

Tong

Huang

Mallidi

et al. 2020

Preprint

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Xiaosu Tong

Streaming ResLSTM with Causal Mean Aggregation for Device-Directed Utterance Detection

Shear-thickening behavior of gelatinized waxy starch dispersions promoted by the starch molecular characteristics

Enhancing ASR for Stuttered Speech with Limited Data Using Detect and Pass

Gaussian Mixture Models for Classification and Hypothesis Tests Under Differential Privacy

Streaming ResLSTM with Causal Mean Aggregation for Device-Directed Utterance Detection

Contact Info

Product

Resources

About