Recent advances in machine learning, combined with the increased availability of large natural language datasets, have made it possible to uncover semantic representations that characterize what people know about and associate with a wide range of objects and concepts. In this paper, we examine the power of word embeddings, a popular approach for uncovering semantic representations, for studying high-level human judgment. Word embeddings are typically applied to linguistic and semantic tasks, however we show that word embeddings can be used to predict complex theoretically-and practically-relevant human perceptions and evaluations in domains as diverse as social cognition, health behavior, risk perception, organizational behavior, and marketing. By learning mappings from word embeddings directly onto judgment ratings, we outperform a similarity-based baseline and perform favorably compared to common metrics of human inter-rater reliability. Word embeddings are also able to identify the concepts that are most associated with observed perceptions and evaluations, and can thus shed light on the psychological substrates of judgment. Overall, we provide new methods and insights for predicting and understanding high-level human judgment, with important applications across the social and behavioral sciences.
It is largely acknowledged that natural languages emerge from not just human brains, but also from rich communities of interacting human brains (Senghas, 2005). Yet the precise role of such communities and such interaction in the emergence of core properties of language has largely gone uninvestigated in naturally emerging systems, leaving the few existing computational investigations of this issue at an artificial setting. Here we take a step towards investigating the precise role of community structure in the emergence of linguistic conventions with both naturalistic empirical data and computational modeling. We first show conventionalization of lexicons in two different classes of naturally emerging signed systems: (1) protolinguistic “homesigns” invented by linguistically isolated Deaf individuals, and (2) a natural sign language emerging in a recently formed rich Deaf community. We find that the latter conventionalized faster than the former. Second, we model conventionalization as a population of interacting individuals who adjust their probability of sign use in response to other individuals' actual sign use, following an independently motivated model of language learning (Yang 2002, 2004). Simulations suggest that a richer social network, like that of natural (signed) languages, conventionalizes faster than a sparser social network, like that of homesign systems. We discuss our behavioral and computational results in light of other work on language emergence, and other work of behavior on complex networks.
Similarity is one of the most important relations humans perceive, arguably subserving category learning and categorization, generalization and discrimination, judgment and decision making, and other cognitive functions. Researchers have proposed a wide range of representations and processes that could be at play in similarity judgment, yet have not comprehensively compared the power of these representations and processes for predicting similarity within and across different semantic categories. We performed such a comparison by pairing eight prominent vector semantic representations with seven established similarity metrics that could operate on these representations, as well as supervised methods for dimensional weighting in the similarity function. This approach yields a factorial model structure with 56 distinct representation-process pairs, which we tested on a novel dataset of similarity judgments between pairs of co-hyponymic words in eight categories. We found that cosine similarity and Pearson correlation were the overall best performing unweighted similarity functions, and that word vectors derived from free association norms often outperformed word vectors derived from text (including those specialized for similarity). Importantly, models that used human similarity judgments to learn category-specific weights on dimensions yielded substantially better predictions than all unweighted approaches across all types of similarity functions and representations, although dimension weights did not generalize well across semantic categories, suggesting strong category context effects in similarity judgment. We discuss implications of these results for cognitive modeling and natural language processing, as well as for theories of representations and processes involved in similarity.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.