What is Your Article Based On? Inferring Fine-grained Provenance

Zhang, Yi; Roth, Dan

doi:10.18653/v1/2021.acl-long.458

Cited by 2 publications

(1 citation statement)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…When study units are organized textual data, we find it meaningful to further divide observed covariates into two broad categories: "explicit observed covariates" that could be derived from the organized textual data at face value, e.g., the number of theorems/equations/figures in a conference paper, and "implicit observed covariates" that capture deeper aspects intrinsic to the textual data. Some concrete examples of implicit covariates include: bag-of-words embeddings such as Word2Vec (Mikolov et al, 2013) and GloVe (Pennington et al, 2014), and contextual embeddings such as BERT (Devlin et al, 2019) and Sen-tenceBERT (Reimers and Gurevych, 2019); perceived sentiments, tones, and emotions from the text (Barbieri et al, 2020;Pérez et al, 2021); topic modeling and keyword summarizing (Xie et al, 2015;Blei and Lafferty, 2007;Ramage et al, 2009;Wang et al, 2020;Santosh et al, 2020); evaluated trustworthiness of the claims made (Nadeem et al, 2019;Zhang et al, 2021b); temporal relationships and semantic relationships of events mentioned (Zhou et al, 2021;Han et al, 2021); commonsense knowledge reasoning (such as complex relations between events, consequences, and predictions) based on the text (Chaturvedi et al, 2017;Speer et al, 2017;Hwang et al, 2021;Jiang et al, 2021). These are by no means exhaustive; nor are they necessary for each and every causal query.…”

Section: A Dichotomy Of Covariatesmentioning

confidence: 99%

Some Reflections on Drawing Causal Inference using Textual Data: Parallels Between Human Subjects and Organized Texts

Zhang¹,

Zhang²

2022

Preprint

View full text Add to dashboard Cite

We examine the role of textual data as study units when conducting causal inference by drawing parallels between human subjects and organized texts. We elaborate on key causal concepts and principles, and expose some ambiguity and sometimes fallacies. To facilitate better framing a causal query, we discuss two strategies: (i) shifting from immutable traits to perceptions of them, and (ii) shifting from some abstract concept/property to its constituent parts, i.e., adopting a constructivist perspective of an abstract concept. We hope this article would raise the awareness of the importance of articulating and clarifying fundamental concepts before delving into developing methodologies when drawing causal inference using textual data.

show abstract

Section: A Dichotomy Of Covariatesmentioning

confidence: 99%

Some Reflections on Drawing Causal Inference using Textual Data: Parallels Between Human Subjects and Organized Texts

Zhang¹,

Zhang²

2022

Preprint

View full text Add to dashboard Cite

show abstract

DoubleCheck: Designing Community-based Assessability for Historical Person Identification

Mohanty,

Luther

2023

J. Comput. Cult. Herit.

View full text Add to dashboard Cite

Historical photos are valuable for their cultural and economic significance, but can be difficult to identify accurately due to various challenges such as low-quality images, lack of corroborating evidence, and limited research resources. Misidentified photos can have significant negative consequences, including lost economic value, incorrect historical records, and the spread of misinformation that can lead to perpetuating conspiracy theories. To accurately assess the credibility of a photo identification (ID), it may be necessary to conduct investigative research, use domain knowledge, and consult experts. In this paper, we introduce DoubleCheck, a quality assessment framework for verifying historical photo IDs on Civil War Photo Sleuth (CWPS), a popular online platform for identifying American Civil War-era photos using facial recognition and crowdsourcing. DoubleCheck focuses on improving CWPS’s user experience and system architecture to display information useful for assessing the quality of historical photo IDs on CWPS. In a mixed-methods evaluation of DoubleCheck, we found that users contributed a wide diversity of sources for photo IDs, which helped facilitate the community’s assessment of these IDs through DoubleCheck’s provenance visualizations. Further, DoubleCheck’s quality assessment badges and visualizations supported users in making accurate assessments of photo IDs, even in cases involving ID conflicts.

show abstract

What is Your Article Based On? Inferring Fine-grained Provenance

Cited by 2 publications

References 15 publications

Some Reflections on Drawing Causal Inference using Textual Data: Parallels Between Human Subjects and Organized Texts

Some Reflections on Drawing Causal Inference using Textual Data: Parallels Between Human Subjects and Organized Texts

DoubleCheck: Designing Community-based Assessability for Historical Person Identification

Contact Info

Product

Resources

About