Deep Learning based techniques have been adopted with precision to solve a lot of standard computer vision problems, some of which are image classification, object detection and segmentation. Despite the widespread success of these approaches, they have not yet been exploited largely for solving the standard perception related problems encountered in autonomous navigation such as Visual Odometry (VO), Structure from Motion (SfM) and Simultaneous Localization and Mapping (SLAM). This paper analyzes the problem of Monocular Visual Odometry using a Deep Learningbased framework, instead of the regular 'feature detection and tracking' pipeline approaches. Several experiments were performed to understand the influence of a known/unknown environment, a conventional trackable feature and pre-trained activations tuned for object classification on the network's ability to accurately estimate the motion trajectory of the camera (or the vehicle). Based on these observations, we propose a Convolutional Neural Network architecture, best suited for estimating the object's pose under known environment conditions, and displays promising results when it comes to inferring the actual scale using just a single camera in real-time.
Neuroimaging data analysis often involves a-priori selection of data features to study the underlying neural activity. Since this could lead to sub-optimal feature selection and thereby prevent the detection of subtle patterns in neural activity, datadriven methods have recently gained popularity for optimizing neuroimaging data analysis pipelines and thereby, improving our understanding of neural mechanisms. In this context, we developed a deep convolutional architecture that can identify discriminating patterns in neuroimaging data and applied it to electroencephalography (EEG) recordings collected from 25 subjects performing a hand motor task before and after a rest period or a bout of exercise. The deep network was trained to classify subjects into exercise and control groups based on differences in their EEG signals. Subsequently, we developed a novel method termed the cue-combination for Class Activation Map (ccCAM), which enabled us to identify discriminating spatio-temporal features within definite frequency bands (23-33 Hz) and assess the effects of exercise on the brain. Additionally, the proposed architecture allowed the visualization of the differences in the propagation of underlying neural activity across the cortex between the two groups, for the first time in our knowledge. Our results demonstrate the feasibility of using deep network architectures for neuroimaging analysis in different contexts such as, for the identification of robust brain biomarkers to better characterize and potentially treat neurological disorders.
Representation learning that leverages large-scale labelled datasets, is central to recent progress in machine learning. Access to task relevant labels at scale is often scarce or expensive, motivating the need to learn from unlabelled datasets with selfsupervised learning (SSL). Such large unlabelled datasets (with data augmentations) often provide a good coverage of the underlying input distribution. However evaluating the representations learned by SSL algorithms still requires task-specific labelled samples in the training pipeline. Additionally, the generalization of task-specific encoding is often sensitive to potential distribution shift. Inspired by recent advances in theoretical machine learning and vision neuroscience, we observe that the eigenspectrum of the empirical feature covariance matrix often follows a power law. For visual representations, we estimate the coefficient of the power law, α, across three key attributes which influence representation learning: learning objective (supervised, SimCLR, Barlow Twins and BYOL), network architecture (VGG, ResNet and Vision Transformer), and tasks (object and scene recognition). We observe that under mild conditions, proximity of α to 1, is strongly correlated to the downstream generalization performance. Furthermore, α ≈ 1 is a strong indicator of robustness to label noise during fine-tuning. Notably, α is computable from the representations without knowledge of any labels, thereby offering a framework to evaluate the quality of representations in unlabelled datasets.
Brain age prediction studies measure the difference between the chronological age of an individual and their predicted age based on neuroimaging data, which has been proposed as an informative measure of disease and cognitive decline. As most previous studies relied exclusively on magnetic resonance imaging (MRI) data, we hereby investigate whether combining structural MRI with functional magnetoencephalography (MEG) information improves age prediction using a large cohort of healthy subjects (N=613, age 18-88) from the Cam-CAN. To this end, we examined the performance of dimensionality reduction and multivariate associative techniques, namely Principal Component Analysis (PCA) and Canonical Correlation Analysis (CCA), to tackle the high dimensionality of neuroimaging data. Using MEG features yielded worse performance when compared to using MRI features, but the combination of both feature sets slightly improved age prediction (mean absolute error of 5.28 yrs). Furthermore, we found that PCA resulted in worse performance, whereas CCA in conjunction with Gaussian process regression models yielded the best prediction performance. Notably, CCA allowed us to visualize the features that significantly contributed to age prediction. We found that MRI features from subcortical structures were more reliable age predictors than cortical features, and that spectral MEG measures were more reliable than connectivity metrics. Our results provide an insight into the underlying processes that are indicative of brain aging, thereby advancing the discovery of valuable biomarkers of neurological syndromes that emerge later during the lifespan.
Autonomous driving is one of the most recent topics of interest which is aimed at replicating human driving behavior keeping in mind the safety issues. We approach the problem of learning synthetic driving using generative neural networks. The main idea is to make a controller trainer network using images plus key press data to mimic human learning. We used the architecture of a stable GAN to make predictions between driving scenes using key presses. We train our model on one video game (Road Rash) and tested the accuracy and compared it by running the model on other maps in Road Rash to determine the extent of learning.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.