Yan Shao scite author profile

Yan Shao

5Publications

155Citation Statements Received

84Citation Statements Given

How they've been cited

146

150

How they cite others

Affiliations

Nanyang Institute of Technology, North University of China, Uppsala University

Publications

Order By: Most citations

82 Treebanks, 34 Models: Universal Dependency Parsing with Multi-Treebank Models

Smith

Bohnet

Lhoneux

et al. 2018

View full text Add to dashboard Cite

We present the Uppsala system for the CoNLL 2018 Shared Task on universal dependency parsing. Our system is a pipeline consisting of three components: the first performs joint word and sentence segmentation; the second predicts part-ofspeech tags and morphological features; the third predicts dependency trees from words and tags. Instead of training a single parsing model for each treebank, we trained models with multiple treebanks for one language or closely related languages, greatly reducing the number of models. On the official test run, we ranked 7th of 27 teams for the LAS and MLAS metrics.Our system obtained the best scores overall for word segmentation, universal POS tagging, and morphological features.Corrigendum: After the test phase was over, we discovered that we had used a non-permitted resource when developing the UPOS tagger for Thai PUD (see Section 4). Setting our LAS, MLAS and UPOS scores to 0.00 for Thai PUD gives the corrected scores: LAS 72.31, MLAS 59.17, UPOS 90.50. This does not affect the ranking for any of the three scores, as confirmed by the shared task organizers. ResourcesAll three components of our system were trained principally on the training sets of Universal Dependencies v2.2 released to coincide with the shared task . The tagger and parser also make use of the pre-trained word

show abstract

From Raw Text to Universal Dependencies - Look, No Tags!

Lhoneux¹,

Shao²,

Basirat³

et al. 2017

View full text Add to dashboard Cite

We present the Uppsala submission to the CoNLL 2017 shared task on parsing from raw text to universal dependencies. Our system is a simple pipeline consisting of two components. The first performs joint word and sentence segmentation on raw text; the second predicts dependency trees from raw words. The parser bypasses the need for part-of-speech tagging, but uses word embeddings based on universal tag distributions. We achieved a macroaveraged LAS F1 of 65.11 in the official test run and obtained the 2nd best result for sentence segmentation with a score of 89.03. After fixing two bugs, we obtained an unofficial LAS F1 of 70.49.

show abstract

Universal Word Segmentation: Implementation and Interpretation

Shao

Hardmeier

Nivre

2018

TACL

View full text Add to dashboard Cite

Word segmentation is a low-level NLP task that is non-trivial for a considerable number of languages. In this paper, we present a sequence tagging framework and apply it to word segmentation for a wide range of languages with different writing systems and typological characteristics. Additionally, we investigate the correlations between various typological factors and word segmentation accuracy. The experimental results indicate that segmentation accuracy is positively related to word boundary markers and negatively to the number of unique non-segmental terms. Based on the analysis, we design a small set of language-specific settings and extensively evaluate the segmentation system on the Universal Dependencies datasets. Our model obtains state-of-the-art accuracies on all the UD languages. It performs substantially better on languages that are non-trivial to segment, such as Chinese, Japanese, Arabic and Hebrew, when compared to previous work.

show abstract

Boosting English-Chinese Machine Transliteration via High Quality Alignment and Multilingual Resources

Shao

Tiedemann

Nivre

2015

View full text Add to dashboard Cite

This paper presents our machine transliteration systems developed for the NEWS 2015 machine transliteration shared task. Our systems are applied to two tasks: English to Chinese and Chinese to English. For standard runs, in which only official data sets are used, we build phrase-based transliteration models with refined alignments provided by the M2M-aligner. For non-standard runs, we add multilingual resources to the systems designed for the standard runs and build different language specific transliteration systems. Linear regression is adopted to rerank the outputs afterwards, which significantly improves the overall transliteration performance.

show abstract

Character-based Joint Segmentation and POS Tagging for Chinese using Bidirectional RNN-CRF

Shao¹,

Hardmeier²,

Tiedemann³

et al. 2017

Preprint

View full text Add to dashboard Cite

We present a character-based model for joint segmentation and POS tagging for Chinese. The bidirectional RNN-CRF architecture for general sequence tagging is adapted and applied with novel vector representations of Chinese characters that capture rich contextual information and sub-character level features. The proposed model is extensively evaluated and compared with a state-of-the-art tagger respectively on CTB5, CTB9 and UD Chinese. The experimental results indicate that our model is accurate and robust across datasets in different sizes, genres and annotation schemes. We obtain stateof-the-art performance on CTB5, achieving 94.38 F1-score for joint segmentation and POS tagging.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Yan Shao

82 Treebanks, 34 Models: Universal Dependency Parsing with Multi-Treebank Models

From Raw Text to Universal Dependencies - Look, No Tags!

Universal Word Segmentation: Implementation and Interpretation

Boosting English-Chinese Machine Transliteration via High Quality Alignment and Multilingual Resources

Character-based Joint Segmentation and POS Tagging for Chinese using Bidirectional RNN-CRF

Contact Info

Product

Resources

About