During the last 3 years, a number of approaches for the normalization of RNA sequencing data have emerged in the literature, differing both in the type of bias adjustment and in the statistical strategy adopted. However, as data continue to accumulate, there has been no clear consensus on the appropriate normalization method to be used or the impact of a chosen method on the downstream analysis. In this work, we focus on a comprehensive comparison of seven recently proposed normalization methods for the differential analysis of RNA-seq data, with an emphasis on the use of varied real and simulated datasets involving different species and experimental designs to represent data characteristics commonly observed in practice. Based on this comparison study, we propose practical recommendations on the appropriate normalization method to be used and its impact on the differential analysis of RNA-seq data.
Background: DNA barcoding aims to assign individuals to given species according to their sequence at a small locus, generally part of the CO1 mitochondrial gene. Amongst other issues, this raises the question of how to deal with within-species genetic variability and potential transpecific polymorphism. In this context, we examine several assignation methods belonging to two main categories: (i) phylogenetic methods (neighbour-joining and PhyML) that attempt to account for the genealogical framework of DNA evolution and (ii) supervised classification methods (k-nearest neighbour, CART, random forest and kernel methods). These methods range from basic to elaborate. We investigated the ability of each method to correctly classify query sequences drawn from samples of related species using both simulated and real data. Simulated data sets were generated using coalescent simulations in which we varied the genealogical history, mutation parameter, sample size and number of species.
f In order to improve the identification of avian pathogenic Escherichia coli (APEC) strains, an extensive characterization of 1,491 E. coli isolates was conducted, based on serotyping, virulence genotyping, and experimental pathogenicity for chickens. The isolates originated from lesions of avian colibacillosis (n ؍ 1,307) or from the intestines of healthy animals (n ؍ 184) from France, Spain, and Belgium. A subset (460 isolates) of this collection was defined according to their virulence for chicks. Six serogroups (O1, O2, O5, O8, O18, and O78) accounted for 56.5% of the APEC isolates and 22.5% of the nonpathogenic isolates. Thirteen virulence genes were more frequently present in APEC isolates than in nonpathogenic isolates but, individually, none of them could allow the identification of an isolate as an APEC strain. In order to take into account the diversity of APEC strains, a statistical analysis based on a tree-modeling method was therefore conducted on the sample of 460 pathogenic and nonpathogenic isolates. This resulted in the identification of four different associations of virulence genes that enables the identification of 70.2% of the pathogenic strains. Pathogenic strains were identified with an error margin of 4.3%. The reliability of the link between these four virulence patterns and pathogenicity for chickens was validated on a sample of 395 E. coli isolates from the collection. The genotyping method described here allowed the identification of more APEC isolates with greater reliability than the classical serotyping methods currently used in veterinary laboratories.
Isolates of wheat leaf rust collected from durum and bread wheat cultivars in France during 1999-2002 were analyzed for virulence on 18 Thatcher lines with single genes for leaf rust resistance (Lr genes). Sampling focused on the five most widely grown bread wheat cultivars (two susceptible and three resistant) to allow statistical comparison of diversity indexes between the cultivars. Leaf rust populations from durum and bread wheats were different. The diversity of the bread wheat leaf rust pathotypes, as measured by the Shannon index, ranged from 2.43 to 2.76 over the 4 years. Diversity for wheat leaf rust resistance was limited in the host since we postulated only seven seedling resistance genes in the 35 cultivars most widely grown during 1999-2002. Leaf rust populations were strongly differentiated for virulence within bread wheat cultivars, and diversity was higher on those that were resistant, mainly due to a more even distribution of virulence phenotypes than on susceptible cultivars. The pathogen population on the susceptible cv. Soissons was largely dominated by a single pathotype (073100), whereas all other pathotypes virulent on cv. Soissons either decreased in frequency or remained at a low frequency during the period studied. Several pathotypes including the most complex one were found only on resistant cultivars, even though most of them were virulent on the susceptible cv. Soissons. Specific interactions were necessary, but not always sufficient, to account for pathotype distribution and frequencies on the cultivars, suggesting that selection for virulence to host resistance genes is balanced by other selective forces including selection for aggressiveness.
The Praomyini tribe is one of the most diverse and abundant groups of Old World rodents. Several species are known to be involved in crop damage and in the epidemiology of several human and cattle diseases. Due to the existence of sibling species their identification is often problematic. Thus an easy, fast and accurate species identification tool is needed for non-systematicians to correctly identify Praomyini species. In this study we compare the usefulness of three genes (16S, Cytb, CO1) for identifying species of this tribe. A total of 426 specimens representing 40 species (sampled across their geographical range) were sequenced for the three genes. Nearly all of the species included in our study are monophyletic in the neighbour joining trees. The degree of intra-specific variability tends to be lower than the divergence between species, but no barcoding gap is detected. The success rate of the statistical methods of species identification is excellent (up to 99% or 100% for statistical supervised classification methods as the k-Nearest Neighbour or Random Forest). The 16S gene is 2.5 less variable than the Cytb and CO1 genes. As a result its discriminatory power is smaller. To sum up, our results suggest that using DNA markers for identifying species in the Praomyini tribe is a largely valid approach, and that the CO1 and Cytb genes are better DNA markers than the 16S gene. Our results confirm the usefulness of statistical methods such as the Random Forest and the 1-NN methods to assign a sequence to a species, even when the number of species is relatively large. Based on our NJ trees and the distribution of all intraspecific and interspecific pairwise nucleotide distances, we highlight the presence of several potentially new species within the Praomyini tribe that should be subject to corroboration assessments.
ObjectiveNo Crohn’s disease (CD) molecular maker has advanced to clinical use, and independent lines of evidence support a central role of the gut microbial community in CD. Here we explore the feasibility of extracting bacterial protein signals relevant to CD, by interrogating myriads of intestinal bacterial proteomes from a small number of patients and healthy controls.DesignWe first developed and validated a workflow—including extraction of microbial communities, two-dimensional difference gel electrophoresis (2D-DIGE), and LC-MS/MS—to discover protein signals from CD-associated gut microbial communities. Then we used selected reaction monitoring (SRM) to confirm a set of candidates. In parallel, we used 16S rRNA gene sequencing for an integrated analysis of gut ecosystem structure and functions.ResultsOur 2D-DIGE-based discovery approach revealed an imbalance of intestinal bacterial functions in CD. Many proteins, largely derived from Bacteroides species, were over-represented, while under-represented proteins were mostly from Firmicutes and some Prevotella members. Most overabundant proteins could be confirmed using SRM. They correspond to functions allowing opportunistic pathogens to colonise the mucus layers, breach the host barriers and invade the mucosae, which could still be aggravated by decreased host-derived pancreatic zymogen granule membrane protein GP2 in CD patients. Moreover, although the abundance of most protein groups reflected that of related bacterial populations, we found a specific independent regulation of bacteria-derived cell envelope proteins.ConclusionsThis study provides the first evidence that quantifiable bacterial protein signals are associated with CD, which can have a profound impact on future molecular diagnosis.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.