We propose a fast and accurate algorithm, VIF regression, for doing feature selection in large regression problems. VIF regression is extremely fast: it uses a one-pass search over the predictors, and a computationally efficient method of testing each potential predictor for addition to the model. VIF regression provably avoids model over-fitting, controlling marginal False Discovery Rate (mFDR).Numerical results show that it is much faster than any other published algorithm for regression with feature selection, and is as accurate as the best of the slower algorithms.
In this article, we study the semiparametric proportional odds model with random effects for correlated, right-censored failure time data. We establish that the maximum likelihood estimators for the parameters of this model are consistent and asymptotically Gaussian. Furthermore, the limiting variances achieve the semiparametric efficiency bounds and can be consistently estimated. Simulation studies show that the asymptotic approximations are accurate for practical sample sizes and that the efficiency gains of the proposed estimators over those of Cai, Cheng and Wei (2002, JASA) can be substantial. A real example is provided to illustrate the proposed methods.
Parent-of-origin effects have been pointed out to be one plausible source of the heritability that was unexplained by genome-wide association studies. Here, we consider a case-control mother-child pair design for studying parent-of-origin effects of offspring genes on neonatal/early-life disorders or pregnancy-related conditions. In contrast to the standard case-control design, the case-control mother-child pair design contains valuable parental information and therefore permits powerful assessment of parent-of-origin effects. Suppose the region under study is in Hardy-Weinberg equilibrium, inheritance is Mendelian at the diallelic locus under study, there is random mating in the source population, and the SNP under study is not related to risk for the phenotype under study because of linkage disequilibrium (LD) with other SNPs. Using a maximum likelihood method that simultaneously assesses likely parental sources and estimates effect sizes of the two offspring genotypes, we investigate the extent of power increase for testing parent-of-origin effects through the incorporation of genotype data for adjacent markers that are in LD with the test locus. Our method does not need to assume the outcome is rare because it exploits supplementary information on phenotype prevalence. Analysis with simulated SNP data indicates that incorporating genotype data for adjacent markers greatly help recover the parent-of-origin information. This recovery can sometimes substantially improve statistical power for detecting parent-of-origin effects. We demonstrate our method by examining parent-of-origin effects of the gene PPARGC1A on low birth weight using data from 636 mother-child pairs in the Jerusalem Perinatal Study.
Case-control mother-child pair design represents a unique advantage for dissecting genetic susceptibility of complex traits because it allows the assessment of both maternal and offspring genetic compositions. This design has been widely adopted in studies of obstetric complications and neonatal outcomes. In this work, we developed an efficient statistical method for evaluating joint genetic and environmental effects on a binary phenotype. Using a logistic regression model to describe the relationship between the phenotype and maternal and offspring genetic and environmental risk factors, we developed a semiparametric maximum likelihood method for the estimation of odds ratio association parameters. Our method is novel because it exploits two unique features of the study data for the parameter estimation. First, the correlation between maternal and offspring SNP genotypes can be specified under the assumptions of random mating, Hardy-Weinberg equilibrium, and Mendelian inheritance. Second, environmental exposures are often not affected by offspring genes conditional on maternal genes. Our method yields more efficient estimates compared with the standard prospective method for fitting logistic regression models to case-control data. We demonstrated the performance of our method through extensive simulation studies and the analysis of data from the Jerusalem Perinatal Study.
Next-generation sequencing technology provides an unprecedented opportunity to identify rare susceptibility variants. It is not yet financially feasible to perform whole-genome sequencing on a large number of subjects, and a two-stage design has been advocated to be a practical option. In stage I, variants are discovered by sequencing the whole genomes of a small number of carefully selected individuals. In stage II, the discovered variants of a large number of individuals are genotyped to assess associations. Individuals with extreme phenotypes are typically selected in stage I. Using simulated data for unrelated individuals, we explore two important aspects of this two-stage design: the efficiency of discovering common and rare single-nucleotide polymorphisms (SNPs) in stage I and the impact of incomplete SNP discovery in stage I on the power of testing associations in stage II. We applied a sum test and a sum of squared score test for gene-based association analyses evaluating the power of the two-stage design. We obtained the following results from extensive simulation studies and analysis of the GAW17 dataset. When individuals with trait values more extreme than the 99.7–99th quantile were included in stage I, the two-stage design could achieve the same power as or even higher than the one-stage design if the rare causal variants had large effect sizes. In such design, fewer than half of the total SNPs including more than half of the causal SNPs were discovered, which included nearly all SNPs with minor allele frequencies (MAFs) ≥5%, more than half of the SNPs with MAFs between 1% and 5%, and fewer than half of the SNPs with MAFs <1%. Although a one-stage design may be preferable to identify multiple rare variants having small to moderate effect sizes, our observations support using the two-stage design as a cost-effective option for next-generation sequencing studies.
Summary
Monoclonal antibodies (mAbs) specific for human β2-microglobulin (β2M) have been shown to induce tumour cell apoptosis in haematological and solid tumours via recruiting major histocompatibility complex (MHC) class I molecules into and excluding cytokine receptors from the lipid rafts. Based on these findings, we hypothesized that IgM anti-β2M mAbs might have stronger apoptotic effects because of their pentameric structure. Our results showed that, compared with IgG mAbs, IgM anti-β2M mAbs exhibited stronger tumouricidal activity in vitro against different tumour cells, including myeloma, mantle cell lymphoma, and prostate cancer, and in vivo in a human-like xenografted myeloma mouse model without damaging normal tissues. IgM mAb-induced apoptosis is dependent on the pentameric structure of the mAbs. Disrupting pentameric IgM into monomeric IgM significantly reduced their ability to induce cell apoptosis. Monomeric IgM mAbs were less efficient at recruiting MHC class I molecules into and exclusion of cytokine receptors from lipid rafts, and at activating the intrinsic apoptosis cascade. Thus, we developed and validated the efficacy of anti-β2M IgM mAbs that may be utilized in the clinical setting and showed that IgM anti-β2M mAbs may be more potent than IgG mAbs at inducing tumour apoptosis.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.