“…To test whether Sim(people, men) > Sim(people, women) at the level of collective concepts, we used word embeddings (13) extracted from the May 2017 Common Crawl corpus [CC-MAIN-2017-22; (41)], which contains a large cross section of the internet: over 630 billion words from 2.96 billion web pages and 250 uncompressed TiB of content. Although the Common Crawl is not accompanied by documentation about its contents, it likely includes informal text (e.g., blogs and discussion forums) written by many individuals, as well as more formal text written by the media, corporations, and governments, mostly in English (42,43). Using word embeddings extracted from this massive corpus, we computed the similarity in linguistic context between words-a proxy for the similarity between the concepts denoted-as the cosine of the angle between corresponding embeddings in vector space, or cosine similarity.…”