Objective In recent years numerous studies have achieved promising results in Alzheimer’s Disease (AD) detection using automatic language processing. We systematically review these articles to understand the effectiveness of this approach, identify any issues and report the main findings that can guide further research. Materials and Methods We searched PubMed, Ovid, and Web of Science for articles published in English between 2013 and 2019. We performed a systematic literature review to answer 5 key questions: (1) What were the characteristics of participant groups? (2) What language data were collected? (3) What features of speech and language were the most informative? (4) What methods were used to classify between groups? (5) What classification performance was achieved? Results and Discussion We identified 33 eligible studies and 5 main findings: participants’ demographic variables (especially age ) were often unbalanced between AD and control group; spontaneous speech data were collected most often; informative language features were related to word retrieval and semantic, syntactic, and acoustic impairment; neural nets, support vector machines, and decision trees performed well in AD detection, and support vector machines and decision trees performed well in decline detection; and average classification accuracy was 89% in AD and 82% in mild cognitive impairment detection versus healthy control groups. Conclusion The systematic literature review supported the argument that language and speech could successfully be used to detect dementia automatically. Future studies should aim for larger and more balanced datasets, combine data collection methods and the type of information analyzed, focus on the early stages of the disease, and report performance using standardized metrics.
We introduce Multi-SimLex, a large-scale lexical resource and evaluation benchmark covering data sets for 12 typologically diverse languages, including major languages (e.g., Mandarin Chinese, Spanish, Russian) as well as less-resourced ones (e.g., Welsh, Kiswahili). Each language data set is annotated for the lexical relation of semantic similarity and contains 1,888 semantically aligned concept pairs, providing a representative coverage of word classes (nouns, verbs, adjectives, adverbs), frequency ranks, similarity intervals, lexical fields, and concreteness levels. Additionally, owing to the alignment of concepts across languages, we provide a suite of 66 cross-lingual semantic similarity data sets. Due to its extensive size and language coverage, Multi-SimLex provides entirely novel opportunities for experimental evaluation and analysis. On its monolingual and cross-lingual benchmarks, we evaluate and analyze a wide array of recent state-of-the-art monolingual and cross-lingual representation models, including static and contextualized word embeddings (such as fastText, monolingual and multilingual BERT, XLM), externally informed lexical representations, as well as fully unsupervised and (weakly) supervised cross-lingual word embeddings. We also present a step-by-step data set creation protocol for creating consistent, Multi-Simlex -style resources for additional languages. We make these contributions - the public release of Multi-SimLex data sets, their creation protocol, strong baseline results, and in-depth analyses which can be be helpful in guiding future developments in multilingual lexical semantics and representation learning - available via a website which will encourage community effort in further expansion of Multi-Simlex to many more languages. Such a large-scale semantic resource could inspire significant further advances in NLP across languages.
Background: Language impairment in Alzheimer’s disease (AD) has been widely studied but due to limited data availability, relatively few studies have focused on the longitudinal change in language in the individuals who later develop AD. Significant differences in speech have previously been found by comparing the press conference transcripts of President Bush and President Reagan, who was later diagnosed with AD. Objective: In the current study, we explored whether the patterns previously established in the single AD-healthy control (HC) participant pair apply to a larger group of individuals who later receive AD diagnosis. Methods: We replicated previous methods on two larger corpora of longitudinal spontaneous speech samples of public figures, consisting of 10 and 9 AD-HC participant pairs. As we failed to find generalizable patterns of language change using previous methodology, we proposed alternative methods for data analysis, investigating the benefits of using different language features and their change with age, and compiling the single features into aggregate scores. Results: The single features that showed the strongest results were moving average type:token ratio (MATTR) and pronoun-related features. The aggregate scores performed better than the single features, with lexical diversity capturing a similar change in two-thirds of the participants. Conclusion: Capturing universal patterns of language change prior to AD can be challenging, but the decline in lexical diversity and changes in MATTR and pronoun-related features act as promising measures that reflect the cognitive changes in many participants.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.