Xuezhe Ma scite author profile

State-of-the-art sequence labeling systems traditionally require large amounts of taskspecific knowledge in the form of handcrafted features and data pre-processing. In this paper, we introduce a novel neutral network architecture that benefits from both word-and character-level representations automatically, by using combination of bidirectional LSTM, CNN and CRF. Our system is truly end-to-end, requiring no feature engineering or data preprocessing, thus making it applicable to a wide range of sequence labeling tasks. We evaluate our system on two data sets for two sequence labeling tasks -Penn Treebank WSJ corpus for part-of-speech (POS) tagging and CoNLL 2003 corpus for named entity recognition (NER). We obtain state-of-the-art performance on both datasets -97.55% accuracy for POS tagging and 91.21% F1 for NER.

show abstract

Harnessing Deep Neural Networks with Logic Rules

Liu

et al. 2016

360

356

View full text Add to dashboard Cite

Combining deep neural networks with structured logic rules is desirable to harness flexibility and reduce uninterpretability of the neural models. We propose a general framework capable of enhancing various types of neural networks (e.g., CNNs and RNNs) with declarative first-order logic rules. Specifically, we develop an iterative distillation method that transfers the structured information of logic rules into the weights of neural networks. We deploy the framework on a CNN for sentiment analysis, and an RNN for named entity recognition. With a few highly intuitive rules, we obtain substantial improvements and achieve state-of-the-art or comparable results to previous best-performing systems.

show abstract

Stack-Pointer Networks for Dependency Parsing

Ma¹,

Hu²,

Liu³

et al. 2018

128

211

View full text Add to dashboard Cite

We introduce a novel architecture for dependency parsing: stack-pointer networks (STACKPTR). Combining pointer networks (Vinyals et al., 2015) with an internal stack, the proposed model first reads and encodes the whole sentence, then builds the dependency tree top-down (from root-to-leaf) in a depth-first fashion. The stack tracks the status of the depthfirst search and the pointer networks select one child for the word at the top of the stack at each step. The STACKPTR parser benefits from the information of the whole sentence and all previously derived subtree structures, and removes the leftto-right restriction in classical transitionbased parsers. Yet, the number of steps for building any (including non-projective) parse tree is linear in the length of the sentence just as other transition-based parsers, yielding an efficient decoding algorithm with O(n 2 ) time complexity. We evaluate our model on 29 treebanks spanning 20 languages and different dependency annotation schemas, and achieve state-of-theart performance on 21 of them.

show abstract

End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF

Hovy

2016

Preprint

147

209

View full text Add to dashboard Cite

FlowSeq: Non-Autoregressive Conditional Sequence Generation with Generative Flow

Ma¹,

Zhou²,

Li³

et al. 2019

117

126

View full text Add to dashboard Cite

Most sequence-to-sequence (seq2seq) models are autoregressive; they generate each token by conditioning on previously generated tokens. In contrast, non-autoregressive seq2seq models generate all tokens in one pass, which leads to increased efficiency through parallel processing on hardware such as GPUs. However, directly modeling the joint distribution of all tokens simultaneously is challenging, and even with increasingly complex model structures accuracy lags significantly behind autoregressive models. In this paper, we propose a simple, efficient, and effective model for non-autoregressive sequence generation using latent variable models. Specifically, we turn to generative flow, an elegant technique to model complex distributions using neural networks, and design several layers of flow tailored for modeling the conditional density of sequential latent variables. We evaluate this model on three neural machine translation (NMT) benchmark datasets, achieving comparable performance with state-of-the-art nonautoregressive NMT models and almost constant decoding time w.r.t the sequence length. 1

show abstract

Choosing Transfer Languages for Cross-Lingual Learning

Lin¹,

Chen²,

Lee³

et al. 2019

102

View full text Add to dashboard Cite

Cross-lingual transfer, where a high-resource transfer language is used to improve the accuracy of a low-resource task language, is now an invaluable tool for improving performance of natural language processing (NLP) on lowresource languages. However, given a particular task language, it is not clear which language to transfer from, and the standard strategy is to select languages based on ad hoc criteria, usually the intuition of the experimenter. Since a large number of features contribute to the success of cross-lingual transfer (including phylogenetic similarity, typological properties, lexical overlap, or size of available data), even the most enlightened experimenter rarely considers all these factors for the particular task at hand. In this paper, we consider this task of automatically selecting optimal transfer languages as a ranking problem, and build models that consider the aforementioned features to perform this prediction. In experiments on representative NLP tasks, we demonstrate that our model predicts good transfer languages much better than ad hoc baselines considering single features in isolation, and glean insights on what features are most informative for each different NLP tasks, which may inform future ad hoc selection even without use of our method. 1 * Equal contribution 1 Code, data, and pre-trained models are available at

show abstract

Harnessing Deep Neural Networks with Logic Rules

Liu

et al. 2016

Preprint

110

View full text Add to dashboard Cite

On Difficulties of Cross-Lingual Transfer with Order Differences: A Case Study on Dependency Parsing

Ahmad¹,

Zhang²,

Ma³

et al. 2019

View full text Add to dashboard Cite

Different languages might have different word orders. In this paper, we investigate crosslingual transfer and posit that an orderagnostic model will perform better when transferring to distant foreign languages. To test our hypothesis, we train dependency parsers on an English corpus and evaluate their transfer performance on 30 other languages. Specifically, we compare encoders and decoders based on Recurrent Neural Networks (RNNs) and modified self-attentive architectures. The former relies on sequential information while the latter is more flexible at modeling word order. Rigorous experiments and detailed analysis shows that RNN-based architectures transfer well to languages that are close to English, while self-attentive models have better overall cross-lingual transferability and perform especially well on distant languages. * Equal contribution. Listed by alphabetical order. † Corresponding author. Language Families Languages Afro-Asiatic Arabic (ar), Hebrew (he) Austronesian Indonesian (id) IE.Baltic Latvian (lv) IE.Germanic Danish (da), Dutch (nl), English (en), German (de), Norwegian (no), Swedish (sv) IE.Indic Hindi (hi) IE.Latin Latin (la) IE.Romance Catalan (ca), French (fr), Italian (it), Portuguese (pt), Romanian (ro), Spanish (es) IE.Slavic Bulgarian (bg), Croatian (hr), Czech (cs), Polish (pl), Russian (ru), Slovak (sk), Slovenian (sl), Ukrainian (uk) Japanese Japanese (ja) Korean Korean (ko) Sino-Tibetan Chinese (zh) Uralic Estonian (et), Finnish (fi)

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Xuezhe Ma

End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF

Harnessing Deep Neural Networks with Logic Rules

Stack-Pointer Networks for Dependency Parsing

End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF

FlowSeq: Non-Autoregressive Conditional Sequence Generation with Generative Flow

Choosing Transfer Languages for Cross-Lingual Learning

Harnessing Deep Neural Networks with Logic Rules

On Difficulties of Cross-Lingual Transfer with Order Differences: A Case Study on Dependency Parsing

Contact Info

Product

Resources

About