Justin T. Chiu scite author profile

We summarize the accomplishments of a multi-disciplinary workshop exploring the computational and scientific issues surrounding zero resource (unsupervised) speech technologies and related models of early language acquisition. Centered around the tasks of phonetic and lexical discovery, we consider unified evaluation metrics, present two new approaches for improving speaker independence in the absence of supervision, and evaluate the application of Bayesian word segmentation algorithms to automatic subword unit tokenizations. Finally, we present two strategies for integrating zero resource techniques into supervised settings, demonstrating the potential of unsupervised methods to improve mainstream technologies.

show abstract

Scaling Hidden Markov Language Models

Chiu¹,

Rushton²

2020

View full text Add to dashboard Cite

The hidden Markov model (HMM) is a fundamental tool for sequence modeling that cleanly separates the hidden state from the emission structure. However, this separation makes it difficult to fit HMMs to large datasets in modern NLP, and they have fallen out of use due to very poor performance compared to fully observed models. This work revisits the challenge of scaling HMMs to language modeling datasets, taking ideas from recent approaches to neural modeling. We propose methods for scaling HMMs to massive state spaces while maintaining efficient exact inference, a compact parameterization, and effective regularization. Experiments show that this approach leads to models that are more accurate than previous HMM and n-gram-based methods, making progress towards the performance of state-of-the-art neural models.

show abstract

Exploring Normalization in Deep Residual Networks with Concatenated Rectified Linear Units

Shang¹,

Chiu

Sohn

2017

AAAI

View full text Add to dashboard Cite

Deep Residual Networks (ResNets) have recently achieved state-of-the-art results on many challenging computer vision tasks. In this work we analyze the role of Batch Normalization (BatchNorm) layers on ResNets in the hope of improving the current architecture and better incorporating other normalization techniques, such as Normalization Propagation (NormProp), into ResNets. Firstly, we verify that BatchNorm helps distribute representation learning to residual blocks at all layers, as opposed to a plain ResNet without BatchNorm where learning happens mostly in the latter part of the network. We also observe that BatchNorm well regularizes Concatenated ReLU (CReLU) activation scheme on ResNets, whose magnitude of activation grows by preserving both positive and negative responses when going deeper into the network. Secondly, we investigate the use of NormProp as a replacement for BatchNorm in ResNets. Though NormProp theoretically attains the same effect as BatchNorm on generic convolutional neural networks, the identity mapping of ResNets invalidates its theoretical promise and NormProp exhibits a significant performance drop when naively applied. To bridge the gap between BatchNorm and NormProp in ResNets, we propose a simple modification to NormProp and employ the CReLU activation scheme. We experiment on visual object recognition benchmark datasets such as CIFAR-10/100 and ImageNet and demonstrate that 1) the modified NormProp performs better than the original NormProp but is still not comparable to BatchNorm and 2) CReLU improves the performance of ResNets with or without normalizations.

show abstract

Latent Alignment and Variational Attention

Deng¹,

Kim²,

Chiu³

et al. 2018

Preprint

View full text Add to dashboard Cite

Neural attention has become central to many state-of-the-art models in natural language processing and related domains. Attention networks are an easy-to-train and effective method for softly simulating alignment; however, the approach does not marginalize over latent alignments in a probabilistic sense. This property makes it difficult to compare attention to other alignment approaches, to compose it with probabilistic models, and to perform posterior inference conditioned on observed data. A related latent approach, hard attention, fixes these issues, but is generally harder to train and less accurate. This work considers variational attention networks, alternatives to soft and hard attention for learning latent variable alignment models, with tighter approximation bounds based on amortized variational inference. We further propose methods for reducing the variance of gradients to make these approaches computationally feasible. Experiments show that for machine translation and visual question answering, inefficient exact latent variable models outperform standard neural attention, but these gains go away when using hard attention based training. On the other hand, variational attention retains most of the performance gain but with training speed comparable to neural attention.

show abstract

Reference-Centric Models for Grounded Collaborative Dialogue

Fried

Chiu

Klein

2021

Preprint

View full text Add to dashboard Cite

We present a grounded neural dialogue model that successfully collaborates with people in a partially-observable reference game. We focus on a setting where two agents each observe an overlapping part of a world context and need to identify and agree on some object they share. Therefore, the agents should pool their information and communicate pragmatically to solve the task. Our dialogue agent accurately grounds referents from the partner's utterances using a structured reference resolver, conditions on these referents using a recurrent memory, and uses a pragmatic generation procedure to ensure the partner can resolve the references the agent produces. We evaluate on the OneCommon spatial grounding dialogue task (Udagawa and Aizawa, 2019), involving a number of dots arranged on a board with continuously varying positions, sizes, and shades. Our agent substantially outperforms the previous state of the art for the task, obtaining a 20% relative improvement in successful task completion in self-play evaluations and a 50% relative improvement in success in human evaluations.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Justin T. Chiu

A summary of the 2012 JHU CLSP workshop on zero resource speech technologies and models of early language acquisition

Scaling Hidden Markov Language Models

Exploring Normalization in Deep Residual Networks with Concatenated Rectified Linear Units

Latent Alignment and Variational Attention

Reference-Centric Models for Grounded Collaborative Dialogue

Contact Info

Product

Resources

About