Glycoproteomics is a powerful yet analytically challenging research tool. Software packages aiding the interpretation of complex glycopeptide tandem mass spectra have appeared, but their relative performance remains untested. Conducted through the HUPO Human Glycoproteomics Initiative, this community study, comprising both developers and users of glycoproteomics software, evaluates solutions for system-wide glycopeptide analysis. The same mass spectrometrybased glycoproteomics datasets from human serum were shared with participants and the relative team performance for N- and O-glycopeptide data analysis was comprehensively established by orthogonal performance tests. Although the results were variable, several high-performance glycoproteomics informatics strategies were identified. Deep analysis of the data revealed key performance-associated search parameters and led to recommendations for improved ‘high-coverage’ and ‘high-accuracy’ glycoproteomics search solutions. This study concludes that diverse software packages for comprehensive glycopeptide data analysis exist, points to several high-performance search strategies and specifies key variables that will guide future software developments and assist informatics decision-making in glycoproteomics.
Array comparative genome hybridization (aCGH) is a widely used methodology to detect copy number variations of a genome in high resolution. Knowing the number of break-points and their corresponding locations in genomic sequences serves different biological needs. Primarily, it helps to identify disease-causing genes that have functional importance in characterizing genome wide diseases. For human autosomes the normal copy number is two, whereas at the sites of oncogenes it increases (gain of DNA) and at the tumour suppressor genes it decreases (loss of DNA). The majority of the current detection methods are deterministic in their set-up and use dynamic programming or different smoothing techniques to obtain the estimates of copy number variations. These approaches limit the search space of the problem due to different assumptions considered in the methods and do not represent the true nature of the uncertainty associated with the unknown break-points in genomic sequences. We propose the Cross-Entropy method, which is a model-based stochastic optimization technique as an exact search method, to estimate both the number and locations of the break-points in aCGH data. We model the continuous scale log-ratio data obtained by the aCGH technique as a multiple break-point problem. The proposed methodology is compared with well established publicly available methods using both artificially generated data and real data. Results show that the proposed procedure is an effective way of estimating number and especially the locations of break-points with high level of precision. Availability: The methods described in this article are implemented in the new R package breakpoint and it is available from the Comprehensive R Archive Network at http://CRAN.R-project.org/package=breakpoint.
Markov chain Monte Carlo (MCMC) is an important computational technique for generating samples from non-standard probability distributions. A major challenge in the design of practical MCMC samplers is to achieve efficient convergence and mixing properties. One way to accelerate convergence and mixing is to adapt the proposal distribution in light of previously sampled points, thus increasing the probability of acceptance. In this paper, we propose two new adaptive MCMC algorithms based on the Independent Metropolis-Hastings algorithm. In the first, we adjust the proposal to minimize an estimate of the cross-entropy between the target and proposal distributions, using the experience of pre-runs. This approach provides a general technique for deriving natural adaptive formulae. The second approach uses multiple parallel chains, and involves updating chains individually, then updating a proposal density by fitting a Bayesian model to the population. An important feature of this approach is that adapting the proposal does not change the limiting distributions of the chains. Consequently, the adaptive phase of the sampler can be continued indefinitely. We include results of numerical experiments indicating that the new algorithms compete well with traditional Metropolis-Hastings algorithms. We also demonstrate the method for a realistic problem arising in Comparative Genomics.
The genomes of complex organisms, including the human genome, are known to vary in GC content along their length. That is, they vary in the local proportion of the nucleotides G and C, as opposed to the nucleotides A and T. Changes in GC content are often abrupt, producing well-defined regions.We model DNA sequences as a multiple change-point process in which the sequence is separated into segments by an unknown number of change-points, with each segment supposed to have been generated by a different process. Multiple change-point problems are important in many biological applications, particularly in the analysis of DNA sequences. Multiple change-point problems also arise in segmentation of protein sequences according to hydrophobicity.We use the Cross-Entropy method to estimate the positions of the change-points. Parameters of the process for each segment are approximated with maximum likelihood estimates. Numerical experiments illustrate the effectiveness of the approach. We obtain estimates of the locations of change-points in artificially generated sequences and compare the accuracy of these estimates with those obtained via other methods such as IsoFinder [1] and Markov Chain Monte Carlo. Lastly, we provide examples with real data sets to illustrate the usefulness of our method.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.