On the State of the Art of Evaluation in Neural Language Models

Melis, Gábor; Dyer, Chris; Blunsom, Phil

doi:10.48550/arxiv.1707.05589

Cited by 86 publications

(108 citation statements)

References 10 publications

Supporting

Mentioning

100

Contrasting

Order By: Relevance

“…While this finding might seem trivial at first, it has far-reaching consequences, as it brings forward questions about the reproducibility of inferences about the mapping between brain activity and cognitive states that are drawn from the interpretation of cognitive decoding decisions of DL models. Recent empirical work in DL research has demonstrated that the convergence of DL models, and thereby the specifics of their learned mapping between input data and target signal, is dependent on many non-deterministic aspects of the training process, such as random seeds and random weight initializations (Dodge et al, 2019, Henderson et al, 2018, Lucic et al, 2018, Reimers and Gurevych, 2017 as well as the specific choices for other hyper-parameters, such as individual layer specifications and optimization algorithms Lucic et al (2018), Melis et al (2017), Zoph and Le (2017). It is thus possible that the mapping between cognitive states and brain activity that a DL model learns can change with these factors of the training.…”

Section: Discussionmentioning

confidence: 99%

Evaluating deep transfer learning for whole-brain cognitive decoding

Thomas¹,

Lindenberger²,

Samek³

et al. 2021

Preprint

View full text Add to dashboard Cite

Research in many fields has shown that transfer learning (TL) is well-suited to improve the performance of deep learning (DL) models in datasets with small numbers of samples. This empirical success has triggered interest in the application of TL to cognitive decoding analyses with functional neuroimaging data. Here, we systematically evaluate TL for the application of DL models to the decoding of cognitive states (e.g., viewing images of faces or houses) from whole-brain functional Magnetic Resonance Imaging (fMRI) data. We first pre-train two DL architectures on a large, public fMRI dataset and subsequently evaluate their performance in an independent experimental task and a fully independent dataset. The pre-trained models consistently achieve higher decoding accuracies and generally require less training time and data than model variants that were not pre-trained, clearly underlining the benefits of pretraining. We demonstrate that these benefits arise from the ability of the pre-trained models to reuse many of their learned features when training with new data, providing deeper insights into the mechanisms giving rise to the benefits of pre-training. Yet, we also surface nuanced challenges for whole-brain cognitive decoding with DL models when interpreting the decoding decisions of the pre-trained models, as these have learned to utilize the fMRI data in unforeseen and counterintuitive ways to identify individual cognitive states.

show abstract

Section: Discussionmentioning

confidence: 99%

Evaluating deep transfer learning for whole-brain cognitive decoding

Thomas¹,

Lindenberger²,

Samek³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…LSTMs [15] are a popular form of recurrent neural networks and serve as a well-known baseline for deep neural network models. Variants using LSTM remain competitive in various NLP tasks [22,25,29]. BiLSTM (Bi-directional LSTM) improves on the original LSTM by reading inputs in both forward and backward directions.…”

Section: Bi-directional Long Short Term Memorymentioning

confidence: 99%

Context-aware legal citation recommendation using deep learning

Huang

Low

Teng

et al. 2021

Proceedings of the Eighteenth International Conference on Artificial Intelligence and Law

View full text Add to dashboard Cite

Lawyers and judges spend a large amount of time researching the proper legal authority to cite while drafting decisions. In this paper, we develop a citation recommendation tool that can help improve efficiency in the process of opinion drafting. We train four types of machine learning models, including a citation-list based method (collaborative filtering) and three context-based methods (text similarity, BiLSTM and RoBERTa classifiers). Our experiments show that leveraging local textual context improves recommendation, and that deep neural models achieve decent performance. We show that non-deep text-based methods benefit from access to structured case metadata, but deep models only benefit from such access when predicting from context of insufficient length. We also find that, even after extensive training, RoBERTa does not outperform a recurrent neural model, despite its benefits of pretraining. Our behavior analysis of the RoBERTa model further shows that predictive performance is stable across time and citation classes.

show abstract

“…For example, in [14], the authors question claimed advances in reinforcement learning research due to the lack of significance metrics and variability of results. In [24], the authors state that many years of claimed superiority in empirical performance in the field of language modeling is actually faulty and showcase that the well-known stacked LSTM architecture (with appropriate hyperparameter tuning) outperforms other more recent and more sophisticated architectures. In [26], the authors highlight a flaw in many previous research works (in the context of Bayesian deep learning) wherein a well established baseline (Monte Carlo dropout) when run to completion (i.e., when learning is not cut-off preemptively by setting it to terminate after a specified number of iterations), achieves similar or superior results compared to the very same models which showcased superior results when introduced.…”

Section: 'Pest' Antipatternmentioning

confidence: 99%

Using AntiPatterns to avoid MLOps Mistakes

Muralidhar,

Muthiah,

Butler

et al. 2021

Preprint

View full text Add to dashboard Cite

We describe lessons learned from developing and deploying machine learning models at scale across the enterprise in a range of financial analytics applications. These lessons are presented in the form of antipatterns. Just as design patterns codify best software engineering practices, antipatterns provide a vocabulary to describe defective practices and methodologies. Here we catalog and document numerous antipatterns in financial ML operations (MLOps). Some antipatterns are due to technical errors, while others are due to not having sufficient knowledge of the surrounding context in which ML results are used. By providing a common vocabulary to discuss these situations, our intent is that antipatterns will support better documentation of issues, rapid communication between stakeholders, and faster resolution of problems. In addition to cataloging antipatterns, we describe solutions, best practices, and future directions toward MLOps maturity.

show abstract

On the State of the Art of Evaluation in Neural Language Models

Cited by 86 publications

References 10 publications

Evaluating deep transfer learning for whole-brain cognitive decoding

Evaluating deep transfer learning for whole-brain cognitive decoding

Context-aware legal citation recommendation using deep learning

Using AntiPatterns to avoid MLOps Mistakes

Contact Info

Product

Resources

About