Aalok Sathe scite author profile

Aalok Sathe

5Publications

31Citation Statements Received

90Citation Statements Given

How they've been cited

How they cite others

106

Affiliations

Microsoft (United States)

Publications

Order By: Most citations

TaxiNLI: Taking a Ride up the NLU Hill

Joshi¹,

Aditya²,

Sathe³

et al. 2020

View full text Add to dashboard Cite

Pre-trained Transformer-based neural architectures have consistently achieved state-of-theart performance in the Natural Language Inference (NLI) task. Since NLI examples encompass a variety of linguistic, logical, and reasoning phenomena, it remains unclear as to which specific concepts are learnt by the trained systems and where they can achieve strong generalization. To investigate this question, we propose a taxonomic hierarchy of categories that are relevant for the NLI task. We introduce TAXINLI, a new dataset, that has 10k examples from the MNLI dataset (Williams et al., 2018) with these taxonomic labels. Through various experiments on TAXINLI, we observe that whereas for certain taxonomic categories SOTA neural models have achieved near perfect accuracies-a large jump over the previous models-some categories still remain difficult. Our work adds to the growing body of literature that shows the gaps in the current NLI systems and datasets through a systematic presentation and analysis of reasoning categories. † denotes equal contribution.

show abstract

Automatic Fact-Checking with Document-level Annotations using BERT and Multiple Instance Learning

Sathe¹,

Park²

2021

View full text Add to dashboard Cite

Automatic fact-checking is crucial for recognizing misinformation spreading on the internet. Most existing fact-checkers break down the process into several subtasks, one of which determines candidate evidence sentences that can potentially support or refute the claim to be verified; typically, evidence sentences with gold-standard labels are needed for this. In a more realistic setting, however, such sentence-level annotations are not available. In this paper, we tackle the natural language inference (NLI) subtask-given a document and a (sentence) claim, determine whether the document supports or refutes the claimonly using document-level annotations. Using fine-tuned BERT and multiple instance learning, we achieve 81.9% accuracy, significantly outperforming the existing results on the WikiFactCheck-English dataset.

show abstract

SentSpace: Large-Scale Benchmarking and Evaluation of Text using Cognitively Motivated Lexical, Syntactic, and Semantic Features

Tuckute¹,

Sathe²,

Wang³

et al. 2022

View full text Add to dashboard Cite

SentSpace is a modular framework for streamlined evaluation of text. SentSpace characterizes textual input using diverse lexical, syntactic, and semantic features derived from corpora and psycholinguistic experiments. Core sentence features fall into three primary feature spaces: 1) Lexical, 2) Contextual, and 3) Embeddings. To aid in the analysis of computed features, SentSpace provides a web interface for interactive visualization and comparison with text from large corpora. The modular design of SentSpace allows researchers to easily integrate their own feature computation into the pipeline while benefiting from a common framework for evaluation and visualization. In this manuscript we will describe the design of SentSpace, its core feature spaces, and demonstrate an example use case by comparing human-written and machine-generated (GPT2-XL) sentences to each other. We find that while GPT2-XL-generated text appears fluent at the surface level, psycholinguistic norms and measures of syntactic processing reveal key differences between text produced by humans and machines. Thus, SentSpace provides a broad set of cognitively motivated linguistic features for evaluation of text within natural language processing, cognitive science, as well as the social sciences.

show abstract

Analyzing the Effects of Reasoning Types on Cross-Lingual Transfer Performance

Karthikeyan¹,

Sathe

Aditya

et al. 2021

Preprint

View full text Add to dashboard Cite

Sample-efficient Linguistic Generalizations through Program Synthesis: Experiments with Phonology Problems

Saujas¹,

Sathe²,

Choudhury³

et al. 2021

Preprint

View full text Add to dashboard Cite

Neural models excel at extracting statistical patterns from large amounts of data, but struggle to learn patterns or reason about language from only a few examples. In this paper, we ask: Can we learn explicit rules that generalize well from only a few examples? We explore this question using program synthesis. We develop a synthesis model to learn phonology rules as programs in a domain-specific language. We test the ability of our models to generalize from few training examples using our new dataset of problems from the Linguistics Olympiad, a challenging set of tasks that require strong linguistic reasoning ability. In addition to being highly sample-efficient, our approach generates human-readable programs, and allows control over the generalizability of the learnt programs.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Aalok Sathe

TaxiNLI: Taking a Ride up the NLU Hill

Automatic Fact-Checking with Document-level Annotations using BERT and Multiple Instance Learning

SentSpace: Large-Scale Benchmarking and Evaluation of Text using Cognitively Motivated Lexical, Syntactic, and Semantic Features

Analyzing the Effects of Reasoning Types on Cross-Lingual Transfer Performance

Sample-efficient Linguistic Generalizations through Program Synthesis: Experiments with Phonology Problems

Contact Info

Product

Resources

About