“…When study units are organized textual data, we find it meaningful to further divide observed covariates into two broad categories: "explicit observed covariates" that could be derived from the organized textual data at face value, e.g., the number of theorems/equations/figures in a conference paper, and "implicit observed covariates" that capture deeper aspects intrinsic to the textual data. Some concrete examples of implicit covariates include: bag-of-words embeddings such as Word2Vec (Mikolov et al, 2013) and GloVe (Pennington et al, 2014), and contextual embeddings such as BERT (Devlin et al, 2019) and Sen-tenceBERT (Reimers and Gurevych, 2019); perceived sentiments, tones, and emotions from the text (Barbieri et al, 2020;Pérez et al, 2021); topic modeling and keyword summarizing (Xie et al, 2015;Blei and Lafferty, 2007;Ramage et al, 2009;Wang et al, 2020;Santosh et al, 2020); evaluated trustworthiness of the claims made (Nadeem et al, 2019;Zhang et al, 2021b); temporal relationships and semantic relationships of events mentioned (Zhou et al, 2021;Han et al, 2021); commonsense knowledge reasoning (such as complex relations between events, consequences, and predictions) based on the text (Chaturvedi et al, 2017;Speer et al, 2017;Hwang et al, 2021;Jiang et al, 2021). These are by no means exhaustive; nor are they necessary for each and every causal query.…”