Quantitative analyses of next-generation sequencing (NGS) data, such as the detection of copy number variations (CNVs), remain challenging. Current methods detect CNVs as changes in the depth of coverage along chromosomes. Technological or genomic variations in the depth of coverage thus lead to a high false discovery rate (FDR), even upon correction for GC content. In the context of association studies between CNVs and disease, a high FDR means many false CNVs, thereby decreasing the discovery power of the study after correction for multiple testing. We propose ‘Copy Number estimation by a Mixture Of PoissonS’ (cn.MOPS), a data processing pipeline for CNV detection in NGS data. In contrast to previous approaches, cn.MOPS incorporates modeling of depths of coverage across samples at each genomic position. Therefore, cn.MOPS is not affected by read count variations along chromosomes. Using a Bayesian approach, cn.MOPS decomposes variations in the depth of coverage across samples into integer copy numbers and noise by means of its mixture components and Poisson distributions, respectively. The noise estimate allows for reducing the FDR by filtering out detections having high noise that are likely to be false detections. We compared cn.MOPS with the five most popular methods for CNV detection in NGS data using four benchmark datasets: (i) simulated data, (ii) NGS data from a male HapMap individual with implanted CNVs from the X chromosome, (iii) data from HapMap individuals with known CNVs, (iv) high coverage data from the 1000 Genomes Project. cn.MOPS outperformed its five competitors in terms of precision (1–FDR) and recall for both gains and losses in all benchmark data sets. The software cn.MOPS is publicly available as an R package at http://www.bioinf.jku.at/software/cnmops/ and at Bioconductor.
msa is available via the Bioconductor project: http://bioconductor.org/packages/release/bioc/html/msa.html. Further information and the R code of the example presented in this paper are available at http://www.bioinf.jku.at/software/msa/.
Summary: Affinity propagation (AP) clustering has recently gained increasing popularity in bioinformatics. AP clustering has the advantage that it allows for determining typical cluster members, the so-called exemplars. We provide an R implementation of this promising new clustering technique to account for the ubiquity of R in bioinformatics. This article introduces the package and presents an application from structural biology. Availability: The R package apcluster is available via CRAN-The Comprehensive R Archive Network: http://cran.r-project.org/web/packages/apcluster
Motivation: Biclustering of transcriptomic data groups genes and samples simultaneously. It is emerging as a standard tool for extracting knowledge from gene expression measurements. We propose a novel generative approach for biclustering called ‘FABIA: Factor Analysis for Bicluster Acquisition’. FABIA is based on a multiplicative model, which accounts for linear dependencies between gene expression and conditions, and also captures heavy-tailed distributions as observed in real-world transcriptomic data. The generative framework allows to utilize well-founded model selection methods and to apply Bayesian techniques.Results: On 100 simulated datasets with known true, artificially implanted biclusters, FABIA clearly outperformed all 11 competitors. On these datasets, FABIA was able to separate spurious biclusters from true biclusters by ranking biclusters according to their information content. FABIA was tested on three microarray datasets with known subclusters, where it was two times the best and once the second best method among the compared biclustering approaches.Availability: FABIA is available as an R package on Bioconductor (http://www.bioconductor.org). All datasets, results and software are available at http://www.bioinf.jku.at/software/fabia/fabia.htmlContact: hochreit@bioinf.jku.atSupplementary information: Supplementary data are available at Bioinformatics online.
Recent studies have revealed that immune repertoires contain a substantial fraction of public clones, which may be defined as Ab or TCR clonal sequences shared across individuals. It has remained unclear whether public clones possess predictable sequence features that differentiate them from private clones, which are believed to be generated largely stochastically. This knowledge gap represents a lack of insight into the shaping of immune repertoire diversity. Leveraging a machine learning approach capable of capturing the high-dimensional compositional information of each clonal sequence (defined by CDR3), we detected predictive public clone and private clone-specific immunogenomic differences concentrated in CDR3's N1-D-N2 region, which allowed the prediction of public and private status with 80% accuracy in humans and mice. Our results unexpectedly demonstrate that public, as well as private, clones possess predictable high-dimensional immunogenomic features. Our support vector machine model could be trained effectively on large published datasets (3 million clonal sequences) and was sufficiently robust for public clone prediction across individuals and studies prepared with different library preparation and high-throughput sequencing protocols. In summary, we have uncovered the existence of high-dimensional immunogenomic rules that shape immune repertoire diversity in a predictable fashion. Our approach may pave the way for the construction of a comprehensive atlas of public mouse and human immune repertoires with potential applications in rational vaccine design and immunotherapeutics.
Equivalence relations and orderings are key concepts of mathematics. For both types of relations, formulations within the framework of fuzzy relations have been proposed already in the early days of fuzzy set theory. While similarity (indistinguishability) relations have turned out to be very useful tools, e.g. for the interpretation of fuzzy partitions and fuzzy controllers, the utilization of fuzzy orderings is still lagging far behind, although there are a lot of possible applications, for instance, in fuzzy preference modeling and fuzzy control. The present paper is devoted to this missing link. After a brief motivation, we will critically analyze the existing approach to fuzzy orderings. In the main part, an alternative approach to fuzzy orderings, which also takes the strong connection to gradual similarity into account, is proposed and studied in detail, including several constructions and representation results.
Aggregation processes are fundamental in any discipline where the fusion of information is of vital interest. For aggregating binary fuzzy relations such as equivalence relations or fuzzy orderings, the question arises which aggregation operators preserve specific properties of the underlying relations, e.g. T-transitivity. It will be shown that preservation of T-transitivity is closely related to the domination of the applied aggregation operator over the corresponding t-norm T. Furthermore, basic properties for dominating aggregation operators, not only in the case of dominating some t-norm T, but dominating some arbitrary aggregation operator, will be presented. Domination of isomorphic t-norms and ordinal sums of t-norms will be treated. Special attention is paid to the four basic t-norms (minimum t-norm, product t-norm, Łukasiewicz t-norm, and the drastic product).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.