Do Neural Language Models Show Preferences for Syntactic Formalisms?

Kulmizev, Artur; Ravishankar, Vinit; Abdou, Mostafa; Nivre, Joakim

doi:10.18653/v1/2020.acl-main.375

Cited by 37 publications

(46 citation statements)

References 20 publications

(27 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Probing results are somewhat dependent on the choice of linguistic formalism used to annotate the data, as Kulmizev et al (2020) found for syntax, and Kuznetsov and Gurevych (2020) found for se-mantic roles. Miaschi et al (2020) examined the layerwise performance of BERT for a suite of linguistic features, before and after fine tuning.…”

Section: Probing Lms For Linguistic Knowledgementioning

confidence: 94%

How is BERT surprised? Layerwise detection of linguistic anomalies

Li¹,

Zhu²,

Thomas³

et al. 2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

View full text Add to dashboard Cite

Transformer language models have shown remarkable ability in detecting when a word is anomalous in context, but likelihood scores offer no information about the cause of the anomaly. In this work, we use Gaussian models for density estimation at intermediate layers of three language models (BERT, RoBERTa, and XLNet), and evaluate our method on BLiMP, a grammaticality judgement benchmark. In lower layers, surprisal is highly correlated to low token frequency, but this correlation diminishes in upper layers. Next, we gather datasets of morphosyntactic, semantic, and commonsense anomalies from psycholinguistic studies; we find that the best performing model RoBERTa exhibits surprisal in earlier layers when the anomaly is morphosyntactic than when it is semantic, while commonsense anomalies do not exhibit surprisal at any intermediate layer. These results suggest that language models employ separate mechanisms to detect different types of linguistic anomalies.

show abstract

Section: Probing Lms For Linguistic Knowledgementioning

confidence: 94%

How is BERT surprised? Layerwise detection of linguistic anomalies

Li¹,

Zhu²,

Thomas³

et al. 2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

View full text Add to dashboard Cite

show abstract

“…For example, they could be interfaced with large-scale pretrained models such as BERT (Devlin et al 2019) and GPT (Radford et al 2019), thereby fusing together deep contextual representations and rich semantic symbols (see Figure 3). Aside from enriching pretrained models (Wu and He 2019; Hardalov, Koychev, and Nakov 2020; Kuncoro et al 2020), such representations could further motivate future research on their interpretability (Wu and He 2019;Hardalov, Koychev, and Nakov 2020;Kuncoro et al 2020;Hewitt and Manning 2019;Kulmizev et al 2020).…”

Section: Universal Drssmentioning

confidence: 99%

Universal Discourse Representation Structure Parsing

Liu

Cohen

Lapata

et al. 2021

Computational Linguistics

View full text Add to dashboard Cite

We consider the task of cross-lingual semantic parsing in the style of Discourse Representation Theory (DRT) where knowledge from annotated corpora in a resource-rich language is transferred via bitext to guide learning in other languages. We introduce Universal Discourse Representation Theory (UDRT), a variant of DRT that explicitly anchors semantic representations to tokens in the linguistic input. We develop a semantic parsing framework based on the Transformer architecture and employ it to obtain semantic resources in multiple languages following two learning schemes. The Many-to-One approach translates non-English text to English, and then runs a relatively accurate English parser on the translated text, while the One-to-Many approach translates gold standard English to non-English text and trains multiple parsers (one per language) on the translations. Experimental results on the Parallel Meaning Bank show that our proposal outperforms strong baselines by a wide margin and can be used to construct (silver-standard) meaning banks for 99 languages.

show abstract

“…Little direct work exists on extensive empirical investigations between UD and SUD with parsers. Recent work by Kulmizev et al (2020) performed probing experiments across a set of languages to extract dependency graphs from BERT (Devlin et al, 2019) and ELMO (Peters et al, 2018) language models, finding that both models prefer UD, with tree shape directly correlated to preference strength.…”

Section: Related Workmentioning

confidence: 99%

Annotations Matter: Leveraging Multi-task Learning to Parse UD and SUD

Sayyed¹,

Dakota²

2021

Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

View full text Add to dashboard Cite

Using multiple treebanks to improve parsing performance has shown positive results. However, to what extent similar, yet competing annotation decisions play in parser behavior is unclear. We investigate this within a multi-task learning (MTL) dependency parser setup on two parallel treebanks, UD and SUD, which, while possessing similar annotation schemes, differ in specific linguistic annotation preferences. We perform a set of experiments with different MTL architectural choices, comparing performance across various input embeddings. We find languages tend to pattern in loose typological associations, but generally the performance within an MTL setting is lower than single model baseline parsers for each annotation scheme. The main contributing factor seems to be the competing syntactic annotation information shared between treebanks in an MTL setting, which is shown in experiments against differently annotated treebanks. This suggests that the impact of how the signal is encoded for annotations and its influence on possible negative transfer is more important than that of the input embeddings in an MTL setting.

show abstract

Do Neural Language Models Show Preferences for Syntactic Formalisms?

Cited by 37 publications

References 20 publications

How is BERT surprised? Layerwise detection of linguistic anomalies

How is BERT surprised? Layerwise detection of linguistic anomalies

Universal Discourse Representation Structure Parsing

Annotations Matter: Leveraging Multi-task Learning to Parse UD and SUD

Contact Info

Product

Resources

About