Francisco Pereira scite author profile

Abstract. Over the past decade, functional Magnetic Resonance Imaging (fMRI) has emerged as a powerful new instrument to collect vast quantities of data about activity in the human brain. A typical fMRI experiment can produce a three-dimensional image related to the human subject's brain activity every half second, at a spatial resolution of a few millimeters. As in other modern empirical sciences, this new instrumentation has led to a flood of new data, and a corresponding need for new data analysis methods. We describe recent research applying machine learning methods to the problem of classifying the cognitive state of a human subject based on fRMI data observed over a single time interval. In particular, we present case studies in which we have successfully trained classifiers to distinguish cognitive states such as (1) whether the human subject is looking at a picture or a sentence, (2) whether the subject is reading an ambiguous or non-ambiguous sentence, and (3) whether the word the subject is viewing is a word describing food, people, buildings, etc. This learning problem provides an interesting case study of classifier learning from extremely high dimensional (10 5 features), extremely sparse (tens of training examples), noisy data. This paper summarizes the results obtained in these three case studies, as well as lessons learned about how to successfully apply machine learning methods to train classifiers in such settings.

show abstract

Reproducibility Distinguishes Conscious from Nonconscious Neural Representations

Schurger

Pereira

Treisman

et al. 2010

Science

121

113

View full text Add to dashboard Cite

What qualifies a neural representation for a role in subjective experience? Previous evidence suggests that the duration and intensity of the neural response to a sensory stimulus are factors. We introduce another attribute--the reproducibility of a pattern of neural activity across different episodes--that predicts specific and measurable differences between conscious and nonconscious neural representations independently of duration and intensity. We found that conscious neural activation patterns are relatively reproducible when compared with nonconscious neural activation patterns corresponding to the same perceptual content. This is not adequately explained by a difference in signal-to-noise ratio.

show abstract

Using Wikipedia to learn semantic feature representations of concrete concepts in neuroimaging experiments

Pereira

Botvinick

Detre

2013

Artificial Intelligence

View full text Add to dashboard Cite

In this paper we show that a corpus of a few thousand Wikipedia articles about concrete or visualizable concepts can be used to produce a low-dimensional semantic feature representation of those concepts. The purpose of such a representation is to serve as a model of the mental context of a subject during functional magnetic resonance imaging (fMRI) experiments. A recent study [19] showed that it was possible to predict fMRI data acquired while subjects thought about a concrete concept, given a representation of those concepts in terms of semantic features obtained with human supervision. We use topic models on our corpus to learn semantic features from text in an unsupervised manner, and show that those features can outperform those in [19] in demanding 12-way and 60-way classification tasks. We also show that these features can be used to uncover similarity relations in brain activation for different concepts which parallel those relations in behavioral data from human subjects.

show abstract

A topographic latent source model for fMRI data

et al. 2011

View full text Add to dashboard Cite

We describe and evaluate a new statistical generative model of functional magnetic resonance imaging (fMRI) data. The model, topographic latent source analysis (TLSA), assumes that fMRI images are generated by a covariate-dependent superposition of latent sources. These sources are defined in terms of basis functions over space. The number of parameters in the model does not depend on the number of voxels, enabling a parsimonious description of activity patterns that avoids many of the pitfalls of traditional voxel-based approaches. We develop a multi-subject extension where latent sources at the subject-level are perturbations of a group-level template. We evaluate TLSA according to prediction, reconstruction and reproducibility. We show that it compares favorably to a Naive Bayes model while using fewer parameters. We also describe a hypothesis-testing framework that can be used to identify significant latent sources.

show abstract

The support vector decomposition machine

Pereira

Gordon

2006

View full text Add to dashboard Cite

In machine learning problems with tens of thousands of features and only dozens or hundreds of independent training examples, dimensionality reduction is essential for good learning performance. In previous work, many researchers have treated the learning problem in two separate phases: first use an algorithm such as singular value decomposition to reduce the dimensionality of the data set, and then use a classification algorithm such as naïve Bayes or support vector machines to learn a classifier. We demonstrate that it is possible to combine the two goals of dimensionality reduction and classification into a single learning objective, and present a novel and efficient algorithm which optimizes this objective directly. We present experimental results in fMRI analysis which show that we can achieve better learning performance and lower-dimensional representations than two-phase approaches can.

show abstract

Exploring predictive and reproducible modeling with the single‐subject FIAC dataset

Chen

Pereira

Lee

et al. 2006

Human Brain Mapping

View full text Add to dashboard Cite

Predictive modeling of functional magnetic resonance imaging (fMRI) has the potential to expand the amount of information extracted and to enhance our understanding of brain systems by predicting brain states, rather than emphasizing the standard spatial mapping. Based on the block datasets of Functional Imaging Analysis Contest (FIAC) Subject 3, we demonstrate the potential and pitfalls of predictive modeling in fMRI analysis by investigating the performance of five models (linear discriminant analysis, logistic regression, linear support vector machine, Gaussian naive Bayes, and a variant) as a function of preprocessing steps and feature selection methods. We found that: (1) independent of the model, temporal detrending and feature selection assisted in building a more accurate predictive model; (2) the linear support vector machine and logistic regression often performed better than either of the Gaussian naive Bayes models in terms of the optimal prediction accuracy; and (3) the optimal prediction accuracy obtained in a feature space using principal components was typically lower than that obtained in a voxel space, given the same model and same preprocessing. We show that due to the existence of artifacts from different sources, high prediction accuracy alone does not guarantee that a classifier is learning a pattern of brain activity that might be usefully visualized, although cross-validation methods do provide fairly unbiased estimates of true prediction accuracy. The trade-off between the prediction accuracy and the reproducibility of the spatial pattern should be carefully considered in predictive modeling of fMRI. We suggest that unless the experimental goal is brain-state classification of new scans on well-defined spatial features, prediction alone should not be used as an optimization procedure in fMRI data analysis.

show abstract

Carbon stocks in a tropical dry forest in Brazil

Pereira

Andrade

Palácio

et al. 2016

View full text Add to dashboard Cite

-The dry forests are the type most widely distributed vegetation in the tropics, and studies aimed at quantifying the carbon stock in these forests are important for it to be quantified their participation as mitigating the effects of climate change. With that in mind, the aim of this research was to quantify the carbon stocks in the woody, herbaceous, litter and root components of a patch of dry tropical forest, with 30 years of regeneration in Iguatu-CE, Brazil. Initially the vegetation was inventoried by means of a floristic and phytosociological survey of the woody component in a 1 ha area which had been under conservation. The biomass was then estimated employing allometric equations, and the stored carbon was quantified. Stocks of carbon in the litter and the herbaceous plants were determined by monitoring their biomass over 24 months, with subsequent conversion into carbon. Carbon stocks in the roots were estimated as the product of their biomass and carbon concentrations, for this were collected 20 samples at a depth of up to 30 cm deep in the dry and rainy season. It was found that the carbon content varies with the chamber and evaluated by adding carbon stored in woody compartments (19,27 t ha Key words: Semi-arid. Caatinga. Biomass. RESUMO -As florestas secas são amplamente distribuídas nos trópicos, e estudos que visam quantificar o estoque de carbono nessas florestas são importantes para que possa ser quantificada sua participação como mitigadora dos efeitos das mudanças climáticas. Nesse sentido, objetivou-se com essa pesquisa quantificar o estoque de carbono nos componentes: arbustivoarbóreo, herbáceo, serrapilheira e raízes em um fragmento de floresta tropical seca, com 30 anos de regeneração, no município de Iguatú-CE. Inicialmente, realizou-se o inventário da vegetação, através de um levantamento florístico e fitossociológico do componente arbustivo-arbóreo. Posteriormente, estimou-se sua biomassa, empregando-se equações alométricas; em sequencia foi quantificado o carbono estocado. Já o estoque de carbono contido na serrapilheira e nas plantas herbáceas foi determinado através do monitoramento de suas biomassas, durante 24 meses, com posterior conversão dessas para carbono. O estoque de carbono contido nas raízes foi estimado através do produto entre suas biomassas e a concentração de carbono, para isso foram coletadas 20 amostras na camada de até 30 cm de profundidade no período seco e chuvoso, totalizando 40 amostras por ano. O teor de carbono varia de acordo com o compartimento avaliado, e somando-se o carbono estocado nos compartimentos arbustivo-arbóreo (19,27 t ha . Isso evidencia, portanto, a contribuição efetiva da floresta tropical seca na diminuição do CO 2 atmosférico. Palavras-chave: Semiárido. Caatinga. Biomasa.

show abstract

Closed-form supervised dimensionality reduction with generalized linear models

Rish

Grabarnik

Cecchi

et al. 2008

View full text Add to dashboard Cite

We propose a family of supervised dimensionality reduction (SDR) algorithms that combine feature extraction (dimensionality reduction) with learning a predictive model in a unified optimization framework, using data-and class-appropriate generalized linear models (GLMs), and handling both classification and regression problems. Our approach uses simple closed-form update rules and is provably convergent. Promising empirical results are demonstrated on a variety of high-dimensional datasets.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

334 Leonard St

Brooklyn, NY 11211

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.