Timothy Dozat scite author profile

This paper describes the neural dependency parser submitted by Stanford to the CoNLL 2017 Shared Task on parsing Universal Dependencies. Our system uses relatively simple LSTM networks to produce part of speech tags and labeled dependency parses from segmented and tokenized sequences of words. In order to address the rare word problem that abounds in languages with complex morphology, we include a character-based word representation that uses an LSTM to produce embeddings from sequences of characters. Our system was ranked first according to all five relevant metrics for the system: UPOS tagging (93.09%), XPOS tagging (82.27%), unlabeled attachment score (81.30%), labeled attachment score (76.30%), and content word labeled attachment score (72.57%).

show abstract

Simpler but More Accurate Semantic Dependency Parsing

Dozat¹,

Manning²

2018

151

160

View full text Add to dashboard Cite

While syntactic dependency annotations concentrate on the surface or functional structure of a sentence, semantic dependency annotations aim to capture betweenword relationships that are more closely related to the meaning of a sentence, using graph-structured representations. We extend the LSTM-based syntactic parser of Dozat and Manning (2017) to train on and generate these graph structures. The resulting system on its own achieves stateof-the-art performance, beating the previous, substantially more complex stateof-the-art system by 0.6% labeled F1. Adding linguistically richer input representations pushes the margin even higher, allowing us to beat it by 1.9% labeled F1.

show abstract

Universal Dependency Parsing from Scratch

Qi¹,

Dozat²,

Zhang³

et al. 2018

186

137

View full text Add to dashboard Cite

This paper describes Stanford's system at the CoNLL 2018 UD Shared Task. We introduce a complete neural pipeline system that takes raw text as input, and performs all tasks required by the shared task, ranging from tokenization and sentence segmentation, to POS tagging and dependency parsing. Our single system submission achieved very competitive performance on big treebanks. Moreover, after fixing an unfortunate bug, our corrected system would have placed the 2 nd , 1 st , and 3 rd on the official evaluation metrics LAS, MLAS, and BLEX, and would have outperformed all submission systems on lowresource treebank categories on all metrics by a large margin. We further show the effectiveness of different model components through extensive ablation studies.

show abstract

Universal Dependencies

Nivre¹,

Abrams²,

Agić³

et al. 2021

226

View full text Add to dashboard Cite

Universal dependencies (UD) is a framework for morphosyntactic annotation of human language, which to date has been used to create treebanks for more than 100 languages. In this article, we outline the linguistic theory of the UD framework, which draws on a long tradition of typologically oriented grammatical theories. Grammatical relations between words are centrally used to explain how predicate–argument structures are encoded morphosyntactically in different languages while morphological features and part-of-speech classes give the properties of words. We argue that this theory is a good basis for cross-linguistically consistent annotation of typologically diverse languages in a way that supports computational natural language understanding as well as broader linguistic studies.

show abstract

FormNet: Structural Encoding beyond Sequential Modeling in Form Document Information Extraction

Lee¹,

Li²,

Dozat³

et al. 2022

View full text Add to dashboard Cite

Sequence modeling has demonstrated state-ofthe-art performance on natural language and document understanding tasks. However, it is challenging to correctly serialize tokens in form-like documents in practice due to their variety of layout patterns. We propose Form-Net, a structure-aware sequence model to mitigate the suboptimal serialization of forms. First, we design Rich Attention that leverages the spatial relationship between tokens in a form for more precise attention score calculation. Second, we construct Super-Tokens for each word by embedding representations from their neighboring tokens through graph convolutions. FormNet therefore explicitly recovers local syntactic information that may have been lost during serialization. In experiments, FormNet outperforms existing methods with a more compact model size and less pretraining data, establishing new state-of-the-art performance on CORD, FUNSD and Payment benchmarks.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Timothy Dozat

Stanford's Graph-based Neural Dependency Parser at the CoNLL 2017 Shared Task

Simpler but More Accurate Semantic Dependency Parsing

Universal Dependency Parsing from Scratch

Universal Dependencies

FormNet: Structural Encoding beyond Sequential Modeling in Form Document Information Extraction

Contact Info

Product

Resources

About