Inferring large-scale covariance matrices from sparse genomic data is an ubiquitous problem in bioinformatics. Clearly, the widely used standard covariance and correlation estimators are ill-suited for this purpose. As statistically efficient and computationally fast alternative we propose a novel shrinkage covariance estimator that exploits the Ledoit-Wolf (2003) lemma for analytic calculation of the optimal shrinkage intensity.Subsequently, we apply this improved covariance estimator (which has guaranteed minimum mean squared error, is well-conditioned, and is always positive definite even for small sample sizes) to the problem of inferring large-scale gene association networks. We show that it performs very favorably compared to competing approaches both in simulations as well as in application to real expression data.
Using computer simulations, we investigate the sensitivity (power) and specificity (true negative rate) of the proposed framework to estimate GGMs from microarray data. This shows that it is possible to recover the true network topology with high accuracy even for small-sample datasets. Subsequently, we analyze gene expression data from a breast cancer tumor study and illustrate our approach by inferring a corresponding large-scale gene association network for 3883 genes.
BackgroundGraphical Gaussian models are popular tools for the estimation of (undirected) gene association networks from microarray data. A key issue when the number of variables greatly exceeds the number of samples is the estimation of the matrix of partial correlations. Since the (Moore-Penrose) inverse of the sample covariance matrix leads to poor estimates in this scenario, standard methods are inappropriate and adequate regularization techniques are needed. Popular approaches include biased estimates of the covariance matrix and high-dimensional regression schemes, such as the Lasso and Partial Least Squares.ResultsIn this article, we investigate a general framework for combining regularized regression methods with the estimation of Graphical Gaussian models. This framework includes various existing methods as well as two new approaches based on ridge regression and adaptive lasso, respectively. These methods are extensively compared both qualitatively and quantitatively within a simulation study and through an application to six diverse real data sets. In addition, all proposed algorithms are implemented in the R package "parcor", available from the R repository CRAN.ConclusionIn our simulation studies, the investigated non-sparse regression methods, i.e. Ridge Regression and Partial Least Squares, exhibit rather conservative behavior when combined with (local) false discovery rate multiple testing in order to decide whether or not an edge is present in the network. For networks with higher densities, the difference in performance of the methods decreases. For sparse networks, we confirm the Lasso's well known tendency towards selecting too many edges, whereas the two-stage adaptive Lasso is an interesting alternative that provides sparser solutions. In our simulations, both sparse and non-sparse methods are able to reconstruct networks with cluster structures. On six real data sets, we also clearly distinguish the results obtained using the non-sparse methods and those obtained using the sparse methods where specification of the regularization parameter automatically means model selection. In five out of six data sets, Partial Least Squares selects very dense networks. Furthermore, for data that violate the assumption of uncorrelated observations (due to replications), the Lasso and the adaptive Lasso yield very complex structures, indicating that they might not be suited under these conditions. The shrinkage approach is more stable than the regression based approaches when using subsampling.
In non-inferiority trials, where non-inferiority of a new experimental drug compared to an active control has to be shown, it may be advisable to use an additional placebo group for internal validation if ethically justifiable. The focus of this paper is on such designs. Assuming normality and homogeneity of variances we will derive a statistical test procedure which turns out to be equivalent to the assessment based on Fieller's confidence interval. Based on the power function of this test, sample size calculations are carried out to achieve a given power. Additionally, the optimal allocation of the total sample size is derived. As an alternative to this parametric procedure, the bootstrap percentile interval is discussed and finally compared with Fieller's confidence interval in a study on mildly asthmatic patients.
In many studies, particularly in the field of systems biology, it is essential that identical protein sets are precisely quantified in multiple samples such as those representing differentially perturbed cell states. The high degree of reproducibility required for such experiments has not been achieved by classical mass spectrometry-based proteomics methods. In this study we describe the implementation of a targeted quantitative approach by which predetermined protein sets are first identified and subsequently quantified at high sensitivity reliably in multiple samples. This approach consists of three steps. First, the proteome is extensively mapped out by multidimensional fractionation and tandem mass spectrometry, and the data generated are assembled in the PeptideAtlas database. Second, based on this proteome map, peptides uniquely identifying the proteins of interest, proteotypic peptides, are selected, and multiple reaction monitoring (MRM) transitions are established and validated by MS2 spectrum acquisition. This process of peptide selection, transition selection, and validation is supported by a suite of software tools, TIQAM (Targeted Identification for Quantitative Analysis by MRM), described in this study. Third, the selected target protein set is quantified in multiple samples by MRM. Applying this approach we were able to reliably quantify low abundance virulence factors from cultures of the human pathogen Streptococcus pyogenes exposed to increasing amounts of plasma. The resulting quantitative protein patterns enabled us to clearly define the subset of virulence proteins that is regulated upon plasma exposure. Molecular & Cellular Proteomics 7:1489 -1500, 2008.A key element of the experimental framework for systems biology is the comprehensive, quantitative measurement of whole biological systems in differentially perturbed states (1). Among the different types of measurements possible, protein quantification is particularly informative because proteins catalyze or control the majority of cellular functions. Currently the most widely applied quantitative proteome analysis technologies consist of the labeling of the samples by stable isotopes, the reproducible separation of complex peptide mixtures, usually by capillary LC, and the identification and quantification of selected peptides by tandem mass spectrometry and sequence database searching (2, 3). Relative quantitative values are generated by these methods if two or more samples are being compared, and absolute quantification can be achieved if suitable, calibrated reference samples are available (4). Using such shotgun methods, in each measurement only a fraction of the analytes present in a complex sample is identified and quantified. Peptide ions are selected by the mass spectrometer automatically based on precursor ion signal intensities. Due to a multitude of factors, including interference between analytes and variations in precursor ion spectra, the selection of peptides is not reproducible in consecutive runs in particular for peptides...
DNA-encoded chemical libraries are promising tools for the discovery of ligands toward protein targets of pharmaceutical relevance. DNA-encoded small molecules can be enriched in affinity-based selections and their unique DNA "barcode" allows the amplification and identification by high-throughput sequencing. We describe selection experiments using a DNA-encoded 4000-compound library generated by Diels-Alder cycloadditions. High-throughput sequencing enabled the identification and relative quantification of library members before and after selection. Sequence enrichment profiles corresponding to the "bar-coded" library members were validated by affinity measurements of single compounds. We were able to affinity mature trypsin inhibitors and identify a series of albumin binders for the conjugation of pharmaceuticals. Furthermore, we discovered a ligand for the antiapoptotic Bcl-xL protein and a class of tumor necrosis factor (TNF) binders that completely inhibited TNF-mediated killing of L-M fibroblasts in vitro.
BackgroundAntibiotics are overused in children and adolescents with lower respiratory tract infection (LRTI). Serum-procalcitonin (PCT) can be used to guide treatment when bacterial infection is suspected. Its role in pediatric LRTI is unclear.MethodsBetween 01/2009 and 02/2010 we randomized previously healthy patients 1 month to 18 years old presenting with LRTI to the emergency departments of two pediatric hospitals in Switzerland to receive antibiotics either according to a PCT guidance algorithm established for adult LRTI or standard care clinical guidelines. In intention-to-treat analyses, antibiotic prescribing rate, duration of antibiotic treatment, and number of days with impairment of daily activities within 14 days of randomization were compared between the two groups.ResultsIn total 337 children, mean age 3.8 years (range 0.1–18), were included. Antibiotic prescribing rates were not significantly different in PCT guided patients compared to controls (OR 1.26; 95% CI 0.81, 1.95). Mean duration of antibiotic exposure was reduced from 6.3 to 4.5 days under PCT guidance (−1.8 days; 95% CI −3.1, −0.5; P = 0.039) for all LRTI and from 9.1 to 5.7 days for pneumonia (−3.4 days 95% CI −4.9, −1.7; P<0.001). There was no apparent difference in impairment of daily activities between PCT guided and control patients.ConclusionPCT guidance reduced antibiotic exposure by reducing the duration of antibiotic treatment, while not affecting the antibiotic prescribing rate. The latter may be explained by the low baseline prescribing rate in Switzerland for pediatric LRTI and the choice of an inappropriately low PCT cut-off level for this population.Trial RegistrationControlled-Trials.com ISRCTN17057980 ISRCTN17057980
Background: There is insufficient evidence regarding which clinical features are best suited to distinguish between transient ischemic attack (TIA) and disorders mimicking TIA (TIA mimics). Methods: We compared the frequency, clinical characteristics and outcome of patients with TIA and TIA mimics in a prospective, single-center emergency department cohort over 2 years. Results: Of 303 patients, 248 (81.8%) had a TIA and 55 (18.2%) had TIA mimics. Epileptic seizures (26/55; 43.7%) and migraine attacks (13/55; 23.6%) were the most common TIA mimics. In patients presenting with unilateral paresis, TIA mimics were less likely than in patients without unilateral paresis [odds ratio (OR) 0.35, 95% confidence interval (CI) 0.17–0.68]. Memory loss (OR 9.17, 95% CI 2.89–32.50), headache (OR 3.71, 95% CI 1.07–12.78) and blurred vision (OR 2.48, 95% CI 0.90–6.59) increased the odds of TIA mimics. Once these clinical features were taken into account, neither aphasia, dysarthria, sensory loss, blood pressure values nor the duration of symptoms were found to improve explanation of the underlying status. At 3 months, stroke, recurrent TIA and myocardial infarction were absent in patients with TIA mimics but occurred in 13 (5.2%), 20 (8.1%) and 3 (1.2%) TIA patients, respectively. Conclusions: About 1 in every 5 patients with suspected TIA had a TIA mimic. Paresis suggested TIA, while other clinical variables used in risk assessment scores after TIA were not shown to distinguish between the two entities. Patients with TIA mimics had a better short-term prognosis.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.