Advances in automatic text classification have been necessitated by the rapid increase in the availability of digital documents. Machine learning (ML) algorithms can ‘learn’ from data: for instance a ML system can be trained on a set of features derived from written texts belonging to known categories, and learn to distinguish between them. Such a trained system can then be used to classify unseen texts. In this paper, we explore the potential of the technique to classify transcribed speech samples along clinical dimensions, using vocabulary data alone. We report the accuracy with which two related ML algorithms [naive Bayes Gaussian (NBG) and naive Bayes multinomial (NBM)] categorized picture descriptions produced by: 32 semantic dementia (SD) patients versus 10 healthy, age-matched controls; and SD patients with left- (n = 21) versus right-predominant (n = 11) patterns of temporal lobe atrophy. We used information gain (IG) to identify the vocabulary features that were most informative to each of these two distinctions. In the SD versus control classification task, both algorithms achieved accuracies of greater than 90%. In the right- versus left-temporal lobe predominant classification, NBM achieved a high level of accuracy (88%), but this was achieved by both NBM and NBG when the features used in the training set were restricted to those with high values of IG. The most informative features for the patient versus control task were low frequency content words, generic terms and components of metanarrative statements. For the right versus left task the number of informative lexical features was too small to support any specific inferences. An enriched feature set, including values derived from Quantitative Production Analysis (QPA) may shed further light on this little understood distinction.
Mixed vascular and Alzheimer-type dementia and pure Alzheimer's disease are both associated with changes in spoken language. These changes have, however, seldom been subjected to systematic comparison. In the present study, we analyzed language samples obtained during the course of a longitudinal clinical study from patients in whom one or other pathology was verified at post mortem. The aims of the study were twofold: first, to confirm the presence of differences in language produced by members of the two groups using quantitative methods of evaluation; and secondly to ascertain the most informative sources of variation between the groups. We adopted a computational approach to evaluate digitized transcripts of connected speech along a range of language-related dimensions. We then used machine learning text classification to assign the samples to one of the two pathological groups on the basis of these features. The classifiers' accuracies were tested using simple lexical features, syntactic features, and more complex statistical and information theory characteristics. Maximum accuracy was achieved when word occurrences and frequencies alone were used. Features based on syntactic and lexical complexity yielded lower discrimination scores, but all combinations of features showed significantly better performance than a baseline condition in which every transcript was assigned randomly to one of the two classes. The classification results illustrate the word content specific differences in the spoken language of the two groups. In addition, those with mixed pathology were found to exhibit a marked reduction in lexical variation and complexity compared to their pure AD counterparts.
Intuition dictates that figurative language and especially metaphorical expressions should convey sentiment. It is the aim of this work to validate this intuition by showing that figurative language (metaphors) appearing in a sentence drive the polarity of that sentence. Towards this target, the current article proposes an approach for sentiment analysis of sentences where figurative language plays a dominant role. This approach applies Word Sense Disambiguation aiming to assign polarity to word senses rather than tokens. Sentence polarity is determined using the individual polarities for metaphorical senses as well as other contextual information. Experimental evaluation shows that the proposed method achieves high scores in comparison with other state-of-the-art approaches tested on the same corpora. Finally, experimental results provide supportive evidence that this method is also well suited for corpora consisting of literal and figurative language sentences. ACM Reference Format:Rentoumi, V., Vouros, G. A., Karkaletsis, V., and Moser, A. 2012. Investigating metaphorical language in sentiment analysis: A sense-to-sentiment perspective.
We used a computational linguistic approach, exploiting machine learning techniques, to examine the letters written by King George III during mentally healthy and apparently mentally ill periods of his life. The aims of the study were: first, to establish the existence of alterations in the King’s written language at the onset of his first manic episode; and secondly to identify salient sources of variation contributing to the changes. Effects on language were sought in two control conditions (politically stressful vs. politically tranquil periods and seasonal variation). We found clear differences in the letter corpus, across a range of different features, in association with the onset of mental derangement, which were driven by a combination of linguistic and information theory features that appeared to be specific to the contrast between acute mania and mental stability. The paucity of existing data relevant to changes in written language in the presence of acute mania suggests that lexical, syntactic and stylometric descriptions of written discourse produced by a cohort of patients with a diagnosis of acute mania will be necessary to support the diagnosis independently and to look for other periods of mental illness of the course of the King’s life, and in other historically significant figures with similarly large archives of handwritten documents.
in dementia, key evidence to establish its stability. The aim of the present study is to assess the temporal stability of WAT-R in patients with dementia. Methods: Thirtytwo patients with a diagnosis of dementia according to the NIA-AA (McKhann et al., 2011) CDR 1 and 2, and 34 subjects classified as normal according to a standard neuropsychological evaluation and neurological consultation has participated. The participants were assessed with a neuropsychological evaluation in two moments (interval from 1 to 6 years). We analyzed the results of the cognitive (ACE, MMSE) and executive (Ineco Frontal Screening) screening tests, and the WAT-R at moment 1 and 2. Results:At the dementia group, significant differences were found between evaluation 1 and 2 in the cognitive screening measures (MMSE t¼3.7 p<.001, ACE-R t¼3.2 p<.001) but not in the WAT-R (t¼0.3 p¼.78) nor in the Executive Screening test (IFS t¼1.9 p¼.06). In the healthy adults group, no significant differences were found in the cognitive (MMSE t¼0.7 p¼.49, ACE t¼1.5 p¼.15) and executive screening tests (t¼0.2 p¼.87) nor in the WAT-R (t¼1.6 p¼.12). Conclusions: Our data suggest the stability of the WAT-R in patients with dementia as well as in subjects without cognitive impairment, suggesting an efficient estimator of premorbid intelligence in aging.
Abstract. In the past we have witnessed our machine learning method for sentiment analysis coping well with figurative language, but determining with uncertainty the polarity of mildly figurative cases. We have shown that for these uncertain cases, a rule-based system should be consulted. We evaluate this collaborative approach on the "Rotten Tomatoes" movie reviews dataset and compare it with other state-of-the-art methods, providing further evidence in favor of this approach.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.