We consider a two-component mixture model where one component distribution is known while the mixing proportion and the other component distribution are unknown. These kinds of models were first introduced in biology to study the differences in expression between genes. The various estimation methods proposed till now have all assumed that the unknown distribution belongs to a parametric family. In this paper, we show how this assumption can be relaxed. First, we note that generally the above model is not identifiable, but we show that under moment and symmetry conditions some 'almost everywhere' identifiability results can be obtained. Where such identifiability conditions are fulfilled we propose an estimation method for the unknown parameters which is shown to be strongly consistent under mild conditions. We discuss applications of our method to microarray data analysis and to the training data problem. We compare our method to the parametric approach using simulated data and, finally, we apply our method to real data from microarray experiments. Copyright 2006 Board of the Foundation of the Scandinavian Journal of Statistics..
In Computational Protein Design (CPD), assuming a rigid backbone and amino-acid rotamer library, the problem of finding a sequence with an optimal conformation is NP-hard. In this paper, using Dunbrack's rotamer library and Talaris2014 decomposable energy function, we use an exact deterministic method combining branch and bound, arc consistency, and tree-decomposition to provenly identify the global minimum energy sequence-conformation on full-redesign problems, defining search spaces of size up to 10(234). This is achieved on a single core of a standard computing server, requiring a maximum of 66GB RAM. A variant of the algorithm is able to exhaustively enumerate all sequence-conformations within an energy threshold of the optimum. These proven optimal solutions are then used to evaluate the frequencies and amplitudes, in energy and sequence, at which an existing CPD-dedicated simulated annealing implementation may miss the optimum on these full redesign problems. The probability of finding an optimum drops close to 0 very quickly. In the worst case, despite 1,000 repeats, the annealing algorithm remained more than 1 Rosetta unit away from the optimum, leading to design sequences that could differ from the optimal sequence by more than 30% of their amino acids.
Thermoresponsive statistical copolymers of N-vinylcaprolactam and N-vinylpyrrolidone of controlled average molar masses and composition were synthesized by RAFT / MADIX polymerization. The cloud point temperature was finely tuned in the range of temperature 36-70°C. DSC measurements combined with statistical calculations suggest that the whole polymer chain is involved in the hydration/dehydration process rather than short polymer sequences. At high concentration, these polymers enable the formation of thermoresponsive hydrogels. Cryo-SEM analysis of these solutions s showed the presence of globular substructures, with a looser 3-D network at higher VP content. This decrease of density of entanglements in the gel network could explain the decrease of mechanical properties and faster rehydration kinetics of the thermogelling polymer
We consider the likelihood ratio test (LRT) process related to the test of the absence of QTL (a QTL denotes a quantitative trait locus, i.e. a gene with quantitative effect on a trait) on the interval [0, T ] representing a chromosome. The observation is the trait and the composition of the genome at some locations called "markers". We give the asymptotic distribution of this LRT process under the null hypothesis that there is no QTL on [0, T ] and under local alternatives with a QTL at t on [0, T ]. We show that the LRT is asymptotically the square of some Gaussian process. We give a description of this process as an " non-linear interpolated and normalized process ". We propose a simple method to calculate the maximum of the LRT process using only statistics on markers and their ratio. This gives a new method to calculate thresholds for QTL detection.
Article published by EDP Sciences and available at http://www.gse-journal.org or http://dx.doi.org/10.1051/gse:2007029 F. Jaffrézic et al.Abstract -A large variety of methods has been proposed in the literature for microarray data analysis. The aim of this paper was to present techniques used by the EADGENE (European Animal Disease Genomics Network of Excellence) WP1.4 participants for data quality control, normalisation and statistical methods for the detection of differentially expressed genes in order to provide some more general data analysis guidelines. All the workshop participants were given a real data set obtained in an EADGENE funded microarray study looking at the gene expression changes following artificial infection with two different mastitis causing bacteria: Escherichia coli and Staphylococcus aureus. It was reassuring to see that most of the teams found the same main biological results. In fact, most of the differentially expressed genes were found for infection by E. coli between uninfected and 24 h challenged udder quarters. Very little transcriptional variation was observed for the bacteria S. aureus. Lists of differentially expressed genes found by the different research teams were, however, quite dependent on the method used, especially concerning the data quality control step. These analyses also emphasised a biological problem of cross-talk between infected and uninfected quarters which will have to be dealt with for further microarray studies. quality control / differentially expressed genes / mastitis resistance / microarray data / normalisation
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.