Pengcheng Yin scite author profile

Large language models have been shown to achieve remarkable performance across a variety of natural language tasks using few-shot learning, which drastically reduces the number of task-specific training examples needed to adapt the model to a particular application. To further our understanding of the impact of scale on few-shot learning, we trained a 540-billion parameter, densely activated, Transformer language model, which we call Pathways Language Model (PaLM).We trained PaLM on 6144 TPU v4 chips using Pathways, a new ML system which enables highly efficient training across multiple TPU Pods. We demonstrate continued benefits of scaling by achieving state-ofthe-art few-shot learning results on hundreds of language understanding and generation benchmarks. On a number of these tasks, PaLM 540B achieves breakthrough performance, outperforming the finetuned stateof-the-art on a suite of multi-step reasoning tasks, and outperforming average human performance on the recently released BIG-bench benchmark. A significant number of BIG-bench tasks showed discontinuous improvements from model scale, meaning that performance steeply increased as we scaled to our largest model. PaLM also has strong capabilities in multilingual tasks and source code generation, which we demonstrate on a wide array of benchmarks. We additionally provide a comprehensive analysis on bias and toxicity, and study the extent of training data memorization with respect to model scale. Finally, we discuss the ethical considerations related to large language models and discuss potential mitigation strategies. * Equal Contribution. Author contributions and ordering details are listed in Appendix A.

show abstract

A Syntactic Neural Model for General-Purpose Code Generation

Yin

Neubig

2017

461

501

View full text Add to dashboard Cite

We consider the problem of parsing natural language descriptions into source code written in a general-purpose programming language like Python. Existing datadriven methods treat this problem as a language generation task without considering the underlying syntax of the target programming language. Informed by previous work in semantic parsing, in this paper we propose a novel neural architecture powered by a grammar model to explicitly capture the target syntax as prior knowledge. Experiments find this an effective way to scale up to generation of complex programs from natural language descriptions, achieving state-of-the-art results that well outperform previous code generation and semantic parsing approaches.

show abstract

TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data

Yin¹,

Neubig²,

Yih³

et al. 2020

217

248

View full text Add to dashboard Cite

Recent years have witnessed the burgeoning of pretrained language models (LMs) for textbased natural language (NL) understanding tasks. Such models are typically trained on free-form NL text, hence may not be suitable for tasks like semantic parsing over structured data, which require reasoning over both free-form NL questions and structured tabular data (e.g., database tables). In this paper we present TABERT, a pretrained LM that jointly learns representations for NL sentences and (semi-)structured tables. TABERT is trained on a large corpus of 26 million tables and their English contexts. In experiments, neural semantic parsers using TABERT as feature representation layers achieve new best results on the challenging weakly-supervised semantic parsing benchmark WIKITABLEQUESTIONS, while performing competitively on the text-to-SQL dataset SPIDER. 1 * Work done while at Facebook AI Research.

show abstract

Learning to mine aligned code and natural language pairs from stack overflow

et al. 2018

View full text Add to dashboard Cite

For tasks like code synthesis from natural language, code retrieval, and code summarization, data-driven models have shown great promise. However, creating these models require parallel data between natural language (NL) and code with fine-grained alignments. Stack Overflow (SO) is a promising source to create such a data set: the questions are diverse and most of them have corresponding answers with high quality code snippets. However, existing heuristic methods (e.g., pairing the title of a post with the code in the accepted answer) are limited both in their coverage and the correctness of the NL-code pairs obtained. In this paper, we propose a novel method to mine high-quality aligned data from SO using two sets of features: hand-crafted features considering the structure of the extracted snippets, and correspondence features obtained by training a probabilistic model to capture the correlation between NL and code using neural networks. These features are fed into a classifier that determines the quality of mined NL-code pairs. Experiments using Python and Java as test beds show that the proposed method greatly expands coverage and accuracy over existing mining methods, even when using only a small number of labeled examples. Further, we find that reasonable results are achieved even when training the classifier on one language and testing on another, showing promise for scaling NL-code mining to a wide variety of programming languages beyond those for which we are able to annotate data.

show abstract

TRANX: A Transition-based Neural Abstract Syntax Parser for Semantic Parsing and Code Generation

Yin¹,

Neubig²

2018

151

116

View full text Add to dashboard Cite

We present TRANX, a transition-based neural semantic parser that maps natural language (NL) utterances into formal meaning representations (MRs). TRANX uses a transition system based on the abstract syntax description language for the target MR, which gives it two major advantages: (1) it is highly accurate, using information from the syntax of the target MR to constrain the output space and model the information flow, and (2) it is highly generalizable, and can easily be applied to new types of MR by just writing a new abstract syntax description corresponding to the allowable structures in the MR. Experiments on four different semantic parsing and code generation tasks show that our system is generalizable, extensible, and effective, registering strong results compared to existing neural semantic parsers. 1

show abstract

StructVAE: Tree-structured Latent Variable Models for Semi-supervised Semantic Parsing

Yin¹,

Zhou²,

He³

et al. 2018

View full text Add to dashboard Cite

Semantic parsing is the task of transducing natural language (NL) utterances into formal meaning representations (MRs), commonly represented as tree structures. Annotating NL utterances with their corresponding MRs is expensive and timeconsuming, and thus the limited availability of labeled data often becomes the bottleneck of data-driven, supervised models. We introduce STRUCTVAE, a variational auto-encoding model for semisupervised semantic parsing, which learns both from limited amounts of parallel data, and readily-available unlabeled NL utterances. STRUCTVAE models latent MRs not observed in the unlabeled data as treestructured latent variables. Experiments on semantic parsing on the ATIS domain and Python code generation show that with extra unlabeled data, STRUCTVAE outperforms strong supervised models. 1

show abstract

Neural Enquirer: Learning to Query Tables in Natural Language

Yin

Lu²,

Li³

et al. 2016

View full text Add to dashboard Cite

We proposed Neural Enquirer as a neural network architecture to execute a natural language (NL) query on a knowledge-base (KB) for answers. Basically, Neural Enquirer finds the distributed representation of a query and then executes it on knowledgebase tables to obtain the answer as one of the values in the tables. Unlike similar efforts in end-to-end training of semantic parsers [11,9], Neural Enquirer is fully "neuralized": it not only gives distributional representation of the query and the knowledge-base, but also realizes the execution of compositional queries as a series of differentiable operations, with intermediate results (consisting of annotations of the tables at different levels) saved on multiple layers of memory. Neural Enquirer can be trained with gradient descent, with which not only the parameters of the controlling components and semantic parsing component, but also the embeddings of the tables and query words can be learned from scratch. The training can be done in an end-to-end fashion, but it can take stronger guidance, e.g., the step-by-step supervision for complicated queries, and benefit from it. Neural Enquirer is one step towards building neural network systems which seek to understand language by executing it on real-world. Our experiments show that Neural Enquirer can learn to execute fairly complicated NL queries on tables with rich structures.

show abstract

Reranking for Neural Semantic Parsing

Yin¹,

Neubig²

2019

View full text Add to dashboard Cite

Semantic parsing considers the task of transducing natural language (NL) utterances into machine executable meaning representations (MRs). While neural network-based semantic parsers have achieved impressive improvements over previous methods, results are still far from perfect, and cursory manual inspection can easily identify obvious problems such as lack of adequacy or coherence of the generated MRs. This paper presents a simple approach to quickly iterate and improve the performance of an existing neural semantic parser by reranking an n-best list of predicted MRs, using features that are designed to fix observed problems with baseline models. We implement our reranker in a competitive neural semantic parser and test on four semantic parsing (GEO, ATIS) and Python code generation (DJANGO, CONALA) tasks, improving the strong baseline parser by up to 5.7% absolute in BLEU (CONALA) and 2.9% in accuracy (DJANGO), outperforming the best published neural parser results on all four datasets. 1

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Pengcheng Yin

PaLM: Scaling Language Modeling with Pathways

A Syntactic Neural Model for General-Purpose Code Generation

TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data

Learning to mine aligned code and natural language pairs from stack overflow

TRANX: A Transition-based Neural Abstract Syntax Parser for Semantic Parsing and Code Generation

StructVAE: Tree-structured Latent Variable Models for Semi-supervised Semantic Parsing

Neural Enquirer: Learning to Query Tables in Natural Language

Reranking for Neural Semantic Parsing

Contact Info

Product

Resources

About