Humans can prepare concise descriptions of pictures, focusing on what they find important. We demonstrate that automatic methods can do so too. We describe a system that can compute a score linking an image to a sentence. This score can be used to attach a descriptive sentence to a given image, or to obtain images that illustrate a given sentence. The score is obtained by comparing an estimate of meaning obtained from the image to one obtained from the sentence. Each estimate of meaning comes from a discriminative procedure that is learned using data. We evaluate on a novel dataset consisting of human-annotated images. While our underlying estimate of meaning is impoverished, it is sufficient to produce very good quantitative results, evaluated with a novel score that can account for synecdoche.
We introduce a new approach for recognizing and reconstructing 3D objects in images. Our approach is based on an analysis by synthesis strategy. A forward synthesis model constructs possible geometric interpretations of the world, and then selects the interpretation that best agrees with the measured visual evidence. The forward model synthesizes visual templates defined on invariant (HOG) features. These visual templates are discriminatively trained to be accurate for inverse estimation. We introduce an efficient "brute-force" approach to inference that searches through a large number of candidate reconstructions, returning the optimal one. One benefit of such an approach is that recognition is inherently (re)constructive. We show state of the art performance for detection and reconstruction on two challenging 3D object recognition datasets of cars and cuboids.
In this article we introduce the cylindrical construction, as an edge-replacement procedure admitting twists on both ends of the hyperedges, generalizing the concepts of lifts and Pultr templates at the same time. We prove a tensor-hom duality for this construction and we show that not only a large number of well-known graph constructions are cylindrical but also the construction and its dual give rise to some new graph constructions, applications and results. To show the applicability of the main duality we introduce generalized Grötzsch, generalized Petersen-like and Coxeter-like graphs and we prove some coloring properties of these graphs.
This paper introduces and analyzes the novel task of categorical classification of cuboidal objects -e.g., distinguishing washing machines versus filing cabinets. To do so, it makes use of recent methods for automatic alignment of cuboidal objects in images. Given such geometric alignments, the natural approach for recognition might extract pose-normalized appearance features from a canonicallyaligned coordinate frame. Though such approaches are extraordinarily common, we demonstrate that they are not optimal, both theoretically and empirically. One reason is that such approaches require accurate shape alignment. However, even with ground-truth alignment, posenormalized representations may still be sub-optimal. Instead, we introduce methods based on pose-synthesis, a somewhat simple approach of augmenting training data with geometrically perturbed training samples. We demonstrate, both theoretically and empirically, that synthesis is a surprisingly simple but effective strategy that allows for state-of-the-art categorization and automatic 3D alignment. To aid our empirical analysis, we introduce a novel dataset for cuboidal object categorization.
Automated segmentation of anatomical sub-regions with high precision has become a necessity to enable the quantification and characterization of cells/ tissues in histology images. Currently, a machine learning model to analyze sub-anatomical regions of the brain to analyze 2D histological images is not available. The scientists rely on manually segmenting anatomical sub-regions of the brain which is extremely timeconsuming and prone to labeler-dependent bias. One of the major challenges in accomplishing such a task is the lack of high-quality annotated images that can be used to train a generic artificial intelligence model. In this study, we employed a UNet-based architecture, compared model performance with various combinations of encoders, image sizes, and sample selection techniques. Additionally, to increase the sample set we resorted to data augmentation which provided data diversity and robust learning. In this study, we trained our best fit model on approximately one thousand annotated 2D brain images stained with Nissl/ Haematoxylin and Tyrosine Hydroxylase enzyme (TH, indicator of dopaminergic neuron viability). The dataset comprises of different animal studies enabling the model to be trained on different datasets. The model effectively is able to detect two sub-regions compacta (SNCD) and reticulata (SNr) in all the images. In spite of limited training data, our best model achieves a mean intersection over union (IOU) of 79% and a mean dice coefficient of 87%. In conclusion, the UNet-based model with EffiecientNet as an encoder outperforms all other encoders, resulting in a first of its kind robust model for multiclass segmentation of sub-brain regions in 2D images.
Speech and language changes occur in Alzheimer's disease (AD), but few studies have characterized their longitudinal course. We analyzed open-ended speech samples from a prodromal-to-mild AD cohort to develop a novel composite score to characterize progressive speech changes. Participant speech from the Clinical Dementia Rating (CDR) interview was analyzed to compute metrics reflecting speech and language characteristics. We determined the aspects of speech and language that exhibited significant longitudinal change over 18 months. Nine acoustic and linguistic measures were combined to create a novel composite score. The speech composite exhibited significant correlations with primary and secondary clinical endpoints and a similar effect size for detecting longitudinal change. Our results demonstrate the feasibility of using automated speech processing to characterize longitudinal change in early AD.Speech-based composite scores could be used to monitor change and detect response to treatment in future research.
BackgroundNovel automated tools for analyzing speech and language may provide new insights into Alzheimer’s disease (AD). Although speech and language changes occur in AD and other neurodegenerative diseases, current clinical assessments to monitor these symptoms can be burdensome and may have limited sensitivity. Through analyses of open‐ended naturalistic speech collected from a standardized clinical interview, we developed a novel measure to characterize progressive speech changes in AD.MethodsWe analyzed Clinical Dementia Rating (CDR) recordings from a subset of 101 participants (58F, 43M, mean age=69 years, SD=7) from the Tauriel trial of semorinemab in prodromal‐to‐mild AD. CDR recordings were collected at the baseline, 6‐month, 12‐month and 18‐month timepoints. Recordings were processed using the Winterlight speech analysis platform which generates >500 acoustic and linguistic features. After controlling for age, sex and level of education, we identified multiple features that had significant linear effects of time (indicating progressive longitudinal change). These speech features were combined into an unweighted composite speech score, which was compared with other clinical endpoints.ResultsThe novel speech composite score included six linguistic features (related to word duration, word frequency, syntactic depth, use of nouns, pronouns and particles) and three acoustic features (related to the power spectrum of the vocal recordings). When compared with clinical endpoints, the speech composite had a similar effect size for detecting longitudinal change (β=0.29) compared to the CDR‐Sum of Boxes (CDR‐SB; β=0.30) and Alzheimer’s Disease Assessment Scale‐Cognitive Subscale (ADAS‐Cog; β=0.22). Notably, it had a significantly greater longitudinal effect size compared to the Repeatable Battery for the Assessment of Neuropsychological Status (RBANS; β=‐0.15, p<0.01) and the language subscales of the ADAS‐Cog measuring word finding difficulty (β=0.12, p<0.01) and spoken language ability (β=0.09, p<0.01).ConclusionsProgressive speech changes are detectable in early AD and measurable via automated language processing tools. Speech composite scores have the potential to be more sensitive measures of disease progression and/or treatment response for speech‐related symptoms in AD that do not contribute to additional patient burden. Further validation is needed to replicate these findings and confirm the clinical and neuropathological relevance of this novel measure.
The ability to predict the future trajectory of a patient is a key step toward the development of therapeutics for complex diseases such as Alzheimer's disease (AD). However, most machine learning approaches developed for prediction of disease progression are either single-task or singlemodality models, which can not be directly adopted to our setting involving multi-task learning with high dimensional images. Moreover, most of those approaches are trained on a single dataset (i.e. cohort), which can not be generalized to other cohorts. We propose a novel multimodal multi-task deep learning model to predict AD progression by analyzing longitudinal clinical and neuroimaging data from multiple cohorts. Our proposed model integrates high dimensional MRI features from a 3D convolutional neural network with other data modalities, including clinical and demographic information, to predict the future trajectory of patients. Our model employs an adversarial loss to alleviate the studyspecific imaging bias, in particular the inter-study domain shifts. In addition, a Sharpness-Aware Minimization (SAM) optimization technique is applied to further improve model generalization. The proposed model is trained and tested on various datasets in order to evaluate and validate the results. Our results showed that 1) our model yields significant improvement over the baseline models, and 2) models using extracted neuroimaging features from 3D convolutional neural network outperform the same models when applied to MRI-derived volumetric features.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.