Katherine Thai scite author profile

Katherine Thai

5Publications

19Citation Statements Received

12Citation Statements Given

How they've been cited

How they cite others

Affiliations

University of Massachusetts Amherst

Publications

Order By: Most citations

RELiC: Retrieving Evidence for Literary Claims

Thai¹,

Yapei²,

Krishna³

et al. 2022

View full text Add to dashboard Cite

Humanities scholars commonly provide evidence for claims that they make about a work of literature (e.g., a novel) in the form of quotations from the work. We collect a large-scale dataset (RELiC) of 78K literary quotations and surrounding critical analysis and use it to formulate the novel task of literary evidence retrieval, in which models are given an excerpt of literary analysis surrounding a masked quotation and asked to retrieve the quoted passage from the set of all passages in the work. Solving this retrieval task requires a deep understanding of complex literary and linguistic phenomena, which proves challenging to methods that overwhelmingly rely on lexical and semantic similarity matching. We implement a RoBERTa-based dense passage retriever for this task that outperforms existing pretrained information retrieval baselines; however, experiments and analysis by human domain experts indicate that there is substantial room for improvement over our dense retriever.

show abstract

Maximum Covering Subtrees for Phylogenetic Networks

Davidov

Hernandez

Jian³

et al. 2021

IEEE/ACM Trans. Comput. Biol. and Bioinf.

View full text Add to dashboard Cite

ChapterBreak: A Challenge Dataset for Long-Range Language Models

Sun¹,

Thai²,

Iyyer³

2022

View full text Add to dashboard Cite

While numerous architectures for long-range language models (LRLMs) have recently been proposed, a meaningful evaluation of their discourse-level language understanding capabilities has not yet followed. To this end, we introduce CHAPTERBREAK, a challenge dataset that provides an LRLM with a long segment from a narrative that ends at a chapter boundary and asks it to distinguish the beginning of the ground-truth next chapter from a set of negative segments sampled from the same narrative. A fine-grained human annotation reveals that our dataset contains many complex types of chapter transitions (e.g., parallel narratives, cliffhanger endings) that require processing global context to comprehend. Experiments on CHAPTERBREAK show that existing LRLMs fail to effectively leverage long-range context, substantially underperforming a segment-level model trained directly for this task. We publicly release our CHAPTERBREAK dataset to spur more principled future research into LRLMs. 1

show abstract

Combining Genetic Algorithms and Machine Learning for Exploring the Navigation Satellite Constellation Design Tradespace

Chang¹,

Duquette²,

Thai³

et al. 2020

View full text Add to dashboard Cite

Exploring Document-Level Literary Machine Translation with Parallel Paragraphs from World Literature

Thai¹,

Karpinska²,

Krishna³

et al. 2022

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Katherine Thai

RELiC: Retrieving Evidence for Literary Claims

Maximum Covering Subtrees for Phylogenetic Networks

ChapterBreak: A Challenge Dataset for Long-Range Language Models

Combining Genetic Algorithms and Machine Learning for Exploring the Navigation Satellite Constellation Design Tradespace

Exploring Document-Level Literary Machine Translation with Parallel Paragraphs from World Literature

Contact Info

Product

Resources

About