Context Matters: Recovering Human Semantic Structure from Machine Learning Analysis of Large‐Scale Text Corpora

Iordan, Marius Cătălin; Giallanza, Tyler; Ellis, Cameron T.; Beckage, Nicole; Cohen, Jonathan D.

doi:10.1111/cogs.13085

Cited by 10 publications

(10 citation statements)

References 81 publications

(168 reference statements)

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…T. Merten отмечает положительную корреляцию между выраженностью психотизма и числом уникальных ассоциативных реакций в различных условиях проведения ассоциативного эксперимента (Merten, 1993). В названных работах, а также в исследовании (Innes, 1972) (Черкасова, 2008), содержащего 253 стимула, отобранных по материалам трех ассоциативных опросов, проведенных с интервалами 10-20 лет с носителями русского языка. В данный словарь вошли русские слова-стимулы, которые повторялись в трех или двух массовых ассоциативных опросах, по материалам которых опубли-кованы важнейшие ассоциативные словари русского языка.…”

Section: Code and Dataset Are Available On Githubunclassified

Индивидуальные Различия В Ассоциативном Значении Слова Сквозь Призму Языковой Модели И Семантического Дифференциала

Литвинова,

Паничева

2024

RR. TAL

View full text Add to dashboard Cite

Существованиеиндивидуальных различийв семантике слова признается многими исследователями. Однако установление и описание подобных различий представляет собой сложную научную задачу, связанную с необходимостью проведения трудоемкой семантической разметки и неизбежным субъективизмом исследователя. В настоящей работе нами предлагается метод выявления различий в индивидуальной семантике слов на основе автоматически рассчитанных оценок ассоциативного значения слова по шкалам семантического дифференциала. Используя дистрибутивную семантическую модель word2vec, обученную на многомиллионном корпусе текстов, и метод Concept Mover's Distance, мы получили для каждого ассоциативного ряда оценки по 18 шкалам семантического дифференциала. В нашей работе впервые данный метод, активно использующийся в новейших работах, рассматривающих текст как данные (преимущественно выполненных в русле computational social science), применяется по отношению к такому объекту анализа, как ассоциативный ряд, с целью описания индивидуальных различий в семантике слов. В качестве материала для исследования мы использовали специально созданный датасет, содержащий ассоциативные реакции к словам-стимулам, важным для русского языкового сознания, данные о психологических характеристиках респондентов (черты «Большой пятерки») и их эмоциональном состоянии в момент прохождения тестирования. Применяя комплекс методов анализа многомерных данных (метод главных компонент, факторный анализ, иерархическая кластеризация на главных компонентах), мы разделили слова-стимулы на группы в зависимости от выраженности индивидуальных различий в их семантике. Нами также была установлена связь эмоционально-психологических характеристик респондентов и автоматически рассчитанных оценок ассоциативного значения слов-стимулов по шкалам семантического дифференциала. Описанная методика анализа может применяться для получения оценок ассоциативных рядов (а также контекстов употребления слов в текстах) по любым семантическим оппозициям и предлагается как дополнение к традиционным методам выявления психологически реального значения слова. Используемый в работе датасет и код для воспроизведения полученных результатов на языке R доступны для исследователей.

show abstract

Section: Code and Dataset Are Available On Githubunclassified

Индивидуальные Различия В Ассоциативном Значении Слова Сквозь Призму Языковой Модели И Семантического Дифференциала

Литвинова,

Паничева

2024

RR. TAL

View full text Add to dashboard Cite

show abstract

“…To identify the degree to which the updated embedding yielded improved prediction of fine-grained similarity, we used 8 existing datasets from three studies [49][50][51] that had examined within category similarity. Note that predicted similarities are likely underestimated, given that the original similarity datasets were collected using different image examples and/or tasks.…”

Section: Fine-grained Prediction Of Perceived Similaritymentioning

confidence: 99%

“…Third, while increases in dataset size did not lead to notable improvements in overall performance, did increasing the dataset size improve more fine-grained predictions of similarity? To address this question, we used several existing datasets of within-category similarity ratings [49][50][51] and computed similarity predictions. Rather than computing similarity across all possible triplets, these predictions were constrained to triplet contexts within superordinate categories (e.g.…”

Section: Data Quality and Data Reliability In The Behavioral Odd-one ...mentioning

confidence: 99%

THINGS-data: A multimodal collection of large-scale datasets for investigating object representations in human brain and behavior

Hebart

Contier

Teichmann

et al. 2022

Preprint

View full text Add to dashboard Cite

Understanding object representations requires a broad, comprehensive sampling of the objects in our visual world with dense measurements of brain activity and behavior. Here we present THINGS-data, a multimodal collection of large-scale datasets comprising functional MRI, magnetoencephalographic recordings, and 4.70 million similarity judgments in response to thousands of photographic images for up to 1,854 object concepts. THINGS-data is unique in its breadth of richly-annotated objects, allowing for testing countless hypotheses at scale while assessing the reproducibility of previous findings. Beyond the unique insights promised by each individual dataset, the multimodality of THINGS-data allows combining datasets for a much broader view into object processing than previously possible. Our analyses demonstrate the high quality of the datasets and provide five examples of hypothesis-driven and data-driven applications. THINGS-data constitutes the core release of the THINGS initiative (https://things-initiative.org) for bridging the gap between disciplines and the advancement of cognitive neuroscience.

show abstract

“…Word embeddings are largely founded on the notion of semantic similarity, and ensuring that word vector similarities match human judgments has been an important goal (e.g., Baroni et al, 2014 ; Pereira et al, 2016 ; An et al, 2018 ; Grand et al, 2018 ; Iordan et al, 2022 ). Less attention has been paid to whether the actual structure of a DSM's similarity space matches what is known about the human lexicon.…”

Section: Inspiration From Human Lexical Abilitiesmentioning

confidence: 99%

“…asteroid, belt , and buckle , Griffiths et al, 2007 )—are not consistently captured in embedding spaces (Griffiths et al, 2007 ; Nematzadeh et al, 2017 ; Rodriguez and Merlo, 2020 ). Building on the insight from Griffiths et al ( 2007 ) that interpretation of a word within the context of a topic can resolve some of these mismatches with human judgments by appropriately disambiguating the words, one avenue for the future may be to consider word embeddings that are topically-constrained (such as in Iordan et al, 2022 ).…”

Section: Inspiration From Human Lexical Abilitiesmentioning

confidence: 99%

Beyond the Benchmarks: Toward Human-Like Lexical Representations

Stevenson¹,

Merlo²

2022

Front. Artif. Intell.

View full text Add to dashboard Cite

To process language in a way that is compatible with human expectations in a communicative interaction, we need computational representations of lexical properties that form the basis of human knowledge of words. In this article, we concentrate on word-level semantics. We discuss key concepts and issues that underlie the scientific understanding of the human lexicon: its richly structured semantic representations, their ready and continual adaptability, and their grounding in crosslinguistically valid conceptualization. We assess the state of the art in natural language processing (NLP) in achieving these identified properties, and suggest ways in which the language sciences can inspire new approaches to their computational instantiation.

show abstract

Context Matters: Recovering Human Semantic Structure from Machine Learning Analysis of Large‐Scale Text Corpora

Cited by 10 publications

References 81 publications

Индивидуальные Различия В Ассоциативном Значении Слова Сквозь Призму Языковой Модели И Семантического Дифференциала

Индивидуальные Различия В Ассоциативном Значении Слова Сквозь Призму Языковой Модели И Семантического Дифференциала

THINGS-data: A multimodal collection of large-scale datasets for investigating object representations in human brain and behavior

Beyond the Benchmarks: Toward Human-Like Lexical Representations

Contact Info

Product

Resources

About