Xiang Lorraine Li scite author profile

Language modelling provides a step towards intelligent communication systems by harnessing large repositories of written human knowledge to better predict and understand the world. In this paper, we present an analysis of Transformer-based language model performance across a wide range of model scales -from models with tens of millions of parameters up to a 280 billion parameter model called Gopher. These models are evaluated on 152 diverse tasks, achieving state-of-the-art performance across the majority. Gains from scale are largest in areas such as reading comprehension, fact-checking, and the identification of toxic language, but logical and mathematical reasoning see less benefit. We provide a holistic analysis of the training dataset and model's behaviour, covering the intersection of model scale with bias and toxicity. Finally we discuss the application of language models to AI safety and the mitigation of downstream harms.

show abstract

Looking Beyond Sentence-Level Natural Language Inference for Question Answering and Text Summarization

Mishra¹,

Patel²,

Vijayakumar³

et al. 2021

View full text Add to dashboard Cite

Natural Language Inference (NLI) has garnered significant attention in recent years; however, the promise of applying NLI breakthroughs to other downstream NLP tasks has remained unfulfilled. In this work, we use the multiple-choice reading comprehension (MCRC) and checking factual correctness of textual summarization (CFCS) tasks to investigate potential reasons for this. Our findings show that: (1) the relatively shorter length of premises in traditional NLI datasets is the primary challenge prohibiting usage in downstream applications (which do better with longer contexts); (2) this challenge can be addressed by automatically converting resource-rich reading comprehension datasets into longer-premise NLI datasets; and (3) models trained on the converted, longer-premise datasets outperform those trained using shortpremise traditional NLI datasets on downstream tasks primarily due to the difference in premise lengths.

show abstract

ProtoQA: A Question Answering Dataset for Prototypical Common-Sense Reasoning

Boratko¹,

Li²,

Das³

et al. 2020

Preprint

View full text Add to dashboard Cite

Given questions regarding some prototypical situation -such as Name something that people usually do before they leave the house for work? -a human can easily answer them via acquired experiences. There can be multiple right answers for such questions with some more common for a situation than others.This paper introduces a new question answering dataset for training and evaluating common-sense reasoning capabilities of artificial intelligence systems in such prototypical situations. The training set is gathered from an existing set of questions played in a long-running international trivia game show -FAMILY-FEUD. The hidden evaluation set is created by gathering answers for each question from 100 crowd-workers. We also propose an open-domain task where a model has to output a ranked list of answers, ideally covering all prototypical answers for a question. On evaluating our dataset with various competitive state-of-the-art models, we find there is a significant gap between the best model and human performance on a number of evaluation metrics.

show abstract

Probabilistic Box Embeddings for Uncertain Knowledge Graph Reasoning

Chen¹,

Boratko²,

Chen³

et al. 2021

View full text Add to dashboard Cite

Knowledge bases often consist of facts which are harvested from a variety of sources, many of which are noisy and some of which conflict, resulting in a level of uncertainty for each triple. Knowledge bases are also often incomplete, prompting the use of embedding methods to generalize from known facts, however existing embedding methods only model triple-level uncertainty and reasoning results lack global consistency. To address these shortcomings, we propose BEUrRE , a novel uncertain knowledge graph embedding method with calibrated probabilistic semantics. BEUrRE models each entity as a box (i.e. axis-aligned hyperrectangle), and relations between two entities as affine transforms on the head and tail entity boxes. The geometry of the boxes allows for efficient calculation of intersections and volumes, endowing the model with calibrated probabilistic semantics and facilitating the incorporation of relational constraints. Extensive experiments on two benchmark datasets show that BEUrRE consistently outperforms baselines on confidence prediction and fact ranking due to it's probabilistic calibration and ability to capture high-order dependencies among facts. 1

show abstract

A Systematic Investigation of Commonsense Knowledge in Large Language Models

Li¹,

Kuncoro²,

Hoffmann³

et al. 2022

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.