Matlab and R code for the prediction methods are available at http://www.med.uio.no/imb/stat/bmms/software/microsurv/.
BackgroundSurvival prediction from high-dimensional genomic data is an active field in today's medical research. Most of the proposed prediction methods make use of genomic data alone without considering established clinical covariates that often are available and known to have predictive value. Recent studies suggest that combining clinical and genomic information may improve predictions, but there is a lack of systematic studies on the topic. Also, for the widely used Cox regression model, it is not obvious how to handle such combined models.ResultsWe propose a way to combine classical clinical covariates with genomic data in a clinico-genomic prediction model based on the Cox regression model. The prediction model is obtained by a simultaneous use of both types of covariates, but applying dimension reduction only to the high-dimensional genomic variables. We describe how this can be done for seven well-known prediction methods: variable selection, unsupervised and supervised principal components regression and partial least squares regression, ridge regression, and the lasso. We further perform a systematic comparison of the performance of prediction models using clinical covariates only, genomic data only, or a combination of the two. The comparison is done using three survival data sets containing both clinical information and microarray gene expression data. Matlab code for the clinico-genomic prediction methods is available at http://www.med.uio.no/imb/stat/bmms/software/clinico-genomic/.ConclusionsBased on our three data sets, the comparison shows that established clinical covariates will often lead to better predictions than what can be obtained from genomic data alone. In the cases where the genomic models are better than the clinical, ridge regression is used for dimension reduction. We also find that the clinico-genomic models tend to outperform the models based on only genomic data. Further, clinico-genomic models and the use of ridge regression gives for all three data sets better predictions than models based on the clinical covariates alone.
In this population-based prospective cohort study, ever users of LNG-IUS had a strongly reduced risk of ovarian and endometrial cancer compared to never users, with no increased risk of breast cancer.
Maternal infections during pregnancy are associated with risk of neurodevelopmental disorders, including autism spectrum disorders (ASDs). Proposed pathogenetic mechanisms include fetal infection, placental inflammation, and maternal cytokines or antibodies that cross the placenta. The Autism Birth Cohort comprises mothers, fathers, and offspring recruited in Norway in 1999 to 2008. Through questionnaire screening, referrals, and linkages to a national patient registry, 442 mothers of children with ASD were identified, and 464 frequency-matched controls were selected. Immunoglobulin G (IgG) antibodies to Toxoplasma gondii, rubella virus, cytomegalovirus (CMV), herpes simplex virus 1 (HSV-1), and HSV-2 in plasma collected at midpregnancy and after delivery were measured by multiplexed immunoassays. High levels of HSV-2 IgG antibodies in maternal midpregnancy plasma were associated with increased risk of ASD in male offspring (an increase in HSV-2 IgG levels from 240 to 640 arbitrary units/ml was associated with an odds ratio of 2.07; 95% confidence interval, 1.06 to 4.06; P ϭ 0.03) when adjusted for parity and child's birth year. No association was found between ASD and the presence of IgG antibodies to Toxoplasma gondii, rubella virus, CMV, or HSV-1. Additional studies are needed to test for replicability of risk and specificity of the sex effect and to examine risk associated with other infections. IMPORTANCEThe cause (or causes) of most cases of autism spectrum disorder is unknown. Evidence from epidemiological studies and work in animal models of neurodevelopmental disorders suggest that both genetic and environmental factors may be implicated. The latter include gestational infection and immune activation. In our cohort, high levels of antibodies to herpes simplex virus 2 at midpregnancy were associated with an elevated risk of autism spectrum disorder in male offspring. These findings provide support for the hypothesis that gestational infection may contribute to the pathogenesis of autism spectrum disorder and have the potential to drive new efforts to monitor women more closely for cryptic gestational infection and to implement suppressive therapy during pregnancy.KEYWORDS autism, birth cohort, herpes simplex virus, infection, prenatal, serology
SummaryIn a number of practical cases it is important to determine the likely geographical origin of an individual or a biological sample. A dead body, old bones or a sample of semen may be available. Information on where the sample might come from can assist investigation or research. The first part of this paper is independent of specific data structure. We formulate the problem as a classification problem. Bayes' theorem allows different sources of information or data to be reconciled conveniently. The main part of the paper involves high dimensional data for which simple, standard methods are not likely to work properly. Mitochondrial DNA (mtDNA) data is a typical example of such data. We propose a procedure involving essentially two steps. First, principal component analysis is used to reduce the dimension of the data. Next, quadratic discriminant analysis performs the actual classification. A cross validation procedure is implemented to select the optimal number of principal components. The importance of using separate data sets for model fitting and testing is emphasized. This method distinguishes well between individuals with a self reported European (Icelandic or German) origin and SE Africans. In this case the error rate is 2.0%.
BackgroundThe understanding of changes in temporal processes related to human carcinogenesis is limited. One approach for prospective functional genomic studies is to compile trajectories of differential expression of genes, based on measurements from many case-control pairs. We propose a new statistical method that does not assume any parametric shape for the gene trajectories.MethodsThe trajectory of a gene is defined as the curve representing the changes in gene expression levels in the blood as a function of time to cancer diagnosis. In a nested case–control design it consists of differences in gene expression levels between cases and controls. Genes can be grouped into curve groups, each curve group corresponding to genes with a similar development over time. The proposed new statistical approach is based on a set of hypothesis testing that can determine whether or not there is development in gene expression levels over time, and whether this development varies among different strata. Curve group analysis may reveal significant differences in gene expression levels over time among the different strata considered. This new method was applied as a “proof of concept” to breast cancer in the Norwegian Women and Cancer (NOWAC) postgenome cohort, using blood samples collected prospectively that were specifically preserved for transcriptomic analyses (PAX tube). Cohort members diagnosed with invasive breast cancer through 2009 were identified through linkage to the Cancer Registry of Norway, and for each case a random control from the postgenome cohort was also selected, matched by birth year and time of blood sampling, to create a case-control pair. After exclusions, 441 case-control pairs were available for analyses, in which we considered strata of lymph node status at time of diagnosis and time of diagnosis with respect to breast cancer screening visits.ResultsThe development of gene expression levels in the NOWAC postgenome cohort varied in the last years before breast cancer diagnosis, and this development differed by lymph node status and participation in the Norwegian Breast Cancer Screening Program. The differences among the investigated strata appeared larger in the year before breast cancer diagnosis compared to earlier years.ConclusionsThis approach shows good properties in term of statistical power and type 1 error under minimal assumptions. When applied to a real data set it was able to discriminate between groups of genes with non-linear similar patterns before diagnosis.
Survival prediction from high-dimensional genomic data is dependent on a proper regularization method. With an increasing number of such methods proposed in the literature, comparative studies are called for and some have been performed. However, there is currently no consensus on which prediction assessment criterion should be used for time-to-event data. Without a firm knowledge about whether the choice of evaluation criterion may affect the conclusions made as to which regularization method performs best, these comparative studies may be of limited value. In this paper, four evaluation criteria are investigated: the log-rank test for two groups, the area under the time-dependent ROC curve (AUC), an R²-measure based on the Cox partial likelihood, and an R²-measure based on the Brier score. The criteria are compared according to how they rank six widely used regularization methods that are based on the Cox regression model, namely univariate selection, principal components regression (PCR), supervised PCR, partial least squares regression, ridge regression, and the lasso. Based on our application to three microarray gene expression data sets, we find that the results obtained from the widely used log-rank test deviate from the other three criteria studied. For future studies, where one also might want to include non-likelihood or non-model-based regularization methods, we argue in favor of AUC and the R²-measure based on the Brier score, as these do not suffer from the arbitrary splitting into two groups nor depend on the Cox partial likelihood.
Traditionally, the prospective design has been chosen for risk factor analyses of lifestyle and cancer using mainly estimation by survival analysis methods. With new technologies, epidemiologists can expand their prospective studies to include functional genomics given either as transcriptomics, mRNA and microRNA, or epigenetics in blood or other biological materials. The novel functional analyses should not be assessed using classical survival analyses since the main goal is not risk estimation, but the analysis of functional genomics as part of the dynamic carcinogenic process over time, i.e., a “processual” approach. In the risk factor model, time to event is analysed as a function of exposure variables known at start of follow-up (fixed covariates) or changing over the follow-up period (time-dependent covariates). In the processual model, transcriptomics or epigenetics is considered as functions of time and exposures. The success of this novel approach depends on the development of new statistical methods with the capacity of describing and analysing the time-dependent curves or trajectories for tens of thousands of genes simultaneously. This approach also focuses on multilevel or integrative analyses introducing novel statistical methods in epidemiology. The processual approach as part of systems epidemiology might represent in a near future an alternative to human in vitro studies using human biological material for understanding the mechanisms and pathways involved in carcinogenesis.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.