Conditional Word Embedding and Hypothesis Testing via Bayes-by-Backprop

Han, Rujun; Gill, Michael; Spirling, Arthur; Cho, Kyunghyun

doi:10.18653/v1/d18-1527

Cited by 9 publications

(8 citation statements)

References 15 publications

(13 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…(There are other ways of approaching the problem of statistical significance for model outputs (see Han et al. 2018) but bootstrapping provides programmatic simplicity and reproducibility with small data sets.) Second, results shift with variations in the user-specified hyperparameters of the selected algorithm, like the dimensionality of the vectors, smoothing, context windows, and sample sizes, suggesting that analysts should select and tune their algorithms by testing for successful task performance (Levy, Goldberg, and Dagan 2015).…”

mentioning

confidence: 99%

A Timely Intervention: Tracking the Changing Meanings of Political Concepts with Word Vectors

Rodman

2019

Polit. Anal.

View full text Add to dashboard Cite

Word vectorization is an emerging text-as-data method that shows great promise for automating the analysis of semantics—here, the cultural meanings of words—in large volumes of text. Yet successes with this method have largely been confined to massive corpora where the meanings of words are presumed to be fixed. In political science applications, however, many corpora are comparatively small and many interesting questions hinge on the recognition that meaning changes over time. Together, these two facts raise vexing methodological challenges. Can word vectors trace the changing cultural meanings of words in typical small corpora use cases? I test four time-sensitive implementations of word vectors (word2vec) against a gold standard developed from a modest data set of 161 years of newspaper coverage. I find that one implementation method clearly outperforms the others in matching human assessments of how public dialogues around equality in America have changed over time. In addition, I suggest best practices for using word2vec to study small corpora for time series questions, including bootstrap resampling of documents and pretraining of vectors. I close by showing that word2vec allows granular analysis of the changing meaning of words, an advance over other common text-as-data methods for semantic research questions.

show abstract

mentioning

confidence: 99%

A Timely Intervention: Tracking the Changing Meanings of Political Concepts with Word Vectors

Rodman

2019

Polit. Anal.

View full text Add to dashboard Cite

show abstract

“…Future methodological work will follow three tracks. The first will build on Rudolph et al (2016) and Han et al (2018), one goal is incorporating document-level metadata into embedding estimation, allowing embeddings to vary according to document-specific attributes, and then, identifying the resulting embeddings. The second will take advantage of stochastic variational inference (Hoffman et al, 2013) to enable Bayesian Word Embeddings to scale to massive corpora.…”

Section: Discussionmentioning

confidence: 99%

“…There have been multiple efforts at developing Bayesian word embeddings (Rudolph et al, 2016;Barkan, 2017;Ji et al, 2017;Havrylov and Titov, 2018), however, none of these have exploited the key advantage of Bayesian inference: the ability to quantify the uncertainty in parameter estimates, and use prior information to inform parameter estimates. The one approach that has incorporated both uncertainty and hypothesis testing is Han et al (2018), who offer both measures of uncertainty, and a way to test the effect of metadata on the similarity of embeddings, however, this approach does not account for identification problems in the learned embeddings.…”

Section: Social Science and Embedding Models Of Languagementioning

confidence: 99%

Identification, Interpretability, and

Lauretig

2019

Proceedings of the Third Workshop on Natural Language Processing and Computational Social Science

View full text Add to dashboard Cite

Social scientists have recently turned to analyzing text using tools from natural language processing like word embeddings to measure concepts like ideology, bias, and affinity. However, word embeddings are difficult to use in the regression framework familiar to social scientists: embeddings are are neither identified, nor directly interpretable. I offer two advances on standard embedding models to remedy these problems. First, I develop Bayesian Word Embeddings with Automatic Relevance Determination priors, relaxing the assumption that all embedding dimensions have equal weight. Second, I apply work identifying latent variable models to anchor the dimensions of the resulting embeddings, identifying them, and making them interpretable and usable in a regression. I then apply this model and anchoring approach to two cases, the shift in internationalist rhetoric in the American presidents' inaugural addresses, and the relationship between bellicosity in American foreign policy decision-makers' deliberations. I find that inaugural addresses became less internationalist after 1945, which goes against the conventional wisdom, and that an increase in bellicosity is associated with an increase in hostile actions by the United States, showing that elite deliberations are not cheap talk, and helping confirm the validity of the model.

show abstract

“…(2013a), and Han et al. (2018). For accessible, applied introductions, see Ruizendaal (2017) and TensorFlow (2018).…”

mentioning

confidence: 96%

“…2015; Han et al. 2018), it has yet to make use of models that account for more complex time dependencies among words.…”

mentioning

confidence: 99%

Using Word Order in Political Text Classification with Long Short-term Memory Models

Chang

Masterson

2019

Polit. Anal.

View full text Add to dashboard Cite

Political scientists often wish to classify documents based on their content to measure variables, such as the ideology of political speeches or whether documents describe a Militarized Interstate Dispute. Simple classifiers often serve well in these tasks. However, if words occurring early in a document alter the meaning of words occurring later in the document, using a more complicated model that can incorporate these time-dependent relationships can increase classification accuracy. Long short-term memory (LSTM) models are a type of neural network model designed to work with data that contains time dependencies. We investigate the conditions under which these models are useful for political science text classification tasks with applications to Chinese social media posts as well as US newspaper articles. We also provide guidance for the use of LSTM models.

show abstract

Conditional Word Embedding and Hypothesis Testing via Bayes-by-Backprop

Cited by 9 publications

References 15 publications

A Timely Intervention: Tracking the Changing Meanings of Political Concepts with Word Vectors

A Timely Intervention: Tracking the Changing Meanings of Political Concepts with Word Vectors

Identification, Interpretability, and

Using Word Order in Political Text Classification with Long Short-term Memory Models

Contact Info

Product

Resources

About