Trace quantities of contaminating DNA are widespread in the laboratory environment, but their presence has received little attention in the context of high throughput sequencing. This issue is highlighted by recent works that have rested controversial claims upon sequencing data that appear to support the presence of unexpected exogenous species. I used reads that preferentially aligned to alternate genomes to infer the distribution of potential contaminant species in a set of independent sequencing experiments. I confirmed that dilute samples are more exposed to contaminating DNA, and, focusing on four single-cell sequencing experiments, found that these contaminants appear to originate from a wide diversity of clades. Although negative control libraries prepared from ‘blank’ samples recovered the highest-frequency contaminants, low-frequency contaminants, which appeared to make heterogeneous contributions to samples prepared in parallel within a single experiment, were not well controlled for. I used these results to show that, despite heavy replication and plausible controls, contamination can explain all of the observations used to support a recent claim that complete genes pass from food to human blood. Contamination must be considered a potential source of signals of exogenous species in sequencing data, even if these signals are replicated in independent experiments, vary across conditions, or indicate a species which seems a priori unlikely to contaminate. Negative control libraries processed in parallel are essential to control for contaminant DNAs, but their limited ability to recover low-frequency contaminants must be recognized.
Although the evolutionary significance of gene duplication has long been recognized, it remains unclear what determines gene duplicability. We find protein complexity to be an important determinant because the proportion of unduplicated genes (P) increases with the number of subunits in a protein. However, P is high (>65%) for both monomers and multimers in yeast, but <30% in human except for subunits of large multimers, implying that organismal complexity is a stronger determinant of gene duplicability than protein complexity. The same conclusion is reached from a comparison of family sizes in yeast and human. Despite Ͼ30 years of effort (1), it remains unclear what determines gene duplicability. Protein complexity, defined as the number of subunits in a protein (n), might be an important factor because duplication of a protein subunit may cause dosage imbalance among the subunits of the protein (2, 3) and the chance of imbalance might increase with the number of subunits in a protein. By using yeast data, Papp et al. (3) found that 33% of the single-copy genes (singletons) participate in protein complexes (multimers), whereas this frequency drops to Ϸ21% for genes with three or more paralogues. They therefore concluded that duplication of a subunit of a protein complex is less likely to be successful than duplication of a monomer. However, no monomers were included in their analysis, so the magnitude of difference in survivability between duplication of a monomer and duplication of a protein complex subunit is not known. It is worth emphasizing that duplication of a monomer may also cause dosage imbalance. This may be particularly true for transcription factors, each of which may control many downstream genes. For example, Drosophila embryos produced by mothers with four dosages of bicoid, a maternal morphogen, tend to develop a larger head, and only Ϸ30% of the embryos produced by mothers with six dosages of bicoid are viable (4). Thus, it is important to include monomers. Indeed, we study the relationship between the survivability of a gene duplication and n by classifying proteins into monomers (n ϭ 1), dimers (n ϭ 2), midsize complexes (3 Յ n Յ 10), and large complexes (n Ͼ 10). Another factor that may affect the survivability of duplicate genes is organismal complexity. It was suggested that, for transcription factors, dosage imbalance occurs more frequently in a complex organism than in yeast because of the long regulatory cascades during multicellular development (3). However, a complex organism may actually be more robust against dosage increase than a simple organism (see below). Thus, we also examine this factor by contrasting human with yeast. Here, organismal complexity is loosely defined as the number of different types of cells.Previously we talked about survivability, which may be defined as the probability for a duplicate gene to survive, but adaptive evolution of duplicate genes may also be important. Because, in the end, we see only whether a gene has been duplicated or not, we will use gene...
BackgroundTrace quantities of contaminating DNA are widespread in the laboratory environment, but their presence has received little attention in the context of high throughput sequencing. This issue is highlighted by recent works that have rested controversial claims upon sequencing data that appear to support the presence of unexpected exogenous species.ResultsI used reads that preferentially aligned to alternate genomes to infer the distribution of potential contaminant species in a set of independent sequencing experiments. I confirmed that dilute samples are more exposed to contaminating DNA, and, focusing on four single-cell sequencing experiments, found that these contaminants appear to originate from a wide diversity of clades. Although negative control libraries prepared from "blank" samples recovered the highest-frequency contaminants, low-frequency contaminants, which appeared to make heterogeneous contributions to samples prepared in parallel within a single experiment, were not well controlled for. I used these results to show that, despite heavy replication and plausible controls, contamination can explain all of the observations used to support a recent claim that complete genes pass from food to human blood.ConclusionsContamination must be considered a potential source of signals of exogenous species in sequencing data, even if these signals are replicated in independent experiments, vary across conditions, or indicate a species which seems a priori unlikely to contaminate. Negative control libraries processed in parallel are essential to control for contaminant DNAs, but their limited ability to recover low-frequency contaminants must be recognized.
The clustering of transcription factor binding sites in developmental enhancers and the apparent preferential conservation of clustered sites have been widely interpreted as proof that spatially constrained physical interactions between transcription factors are required for regulatory function. However, we show here that selection on the composition of enhancers alone, and not their internal structure, leads to the accumulation of clustered sites with evolutionary dynamics that suggest they are preferentially conserved. We simulated the evolution of idealized enhancers from Drosophila melanogaster constrained to contain only a minimum number of binding sites for one or more factors. Under this constraint, mutations that destroy an existing binding site are tolerated only if a compensating site has emerged elsewhere in the enhancer. Overlapping sites, such as those frequently observed for the activator Bicoid and repressor Krüppel, had significantly longer evolutionary half-lives than isolated sites for the same factors. This leads to a substantially higher density of overlapping sites than expected by chance and the appearance that such sites are preferentially conserved. Because D. melanogaster (like many other species) has a bias for deletions over insertions, sites tended to become closer together over time, leading to an overall clustering of sites in the absence of any selection for clustered sites. Since this effect is strongest for the oldest sites, clustered sites also incorrectly appear to be preferentially conserved. Following speciation, sites tend to be closer together in all descendent species than in their common ancestors, violating the common assumption that shared features of species' genomes reflect their ancestral state. Finally, we show that selection on binding site composition alone recapitulates the observed number of overlapping and closely neighboring sites in real D. melanogaster enhancers. Thus, this study calls into question the common practice of inferring “cis-regulatory grammars” from the organization and evolutionary dynamics of developmental enhancers.
The regulation of gene expression controls development, and changes in this regulation often contribute to phenotypic evolution. Drosophila pigmentation is a model system for studying evolutionary changes in gene regulation, with differences in expression of pigmentation genes such as yellow that correlate with divergent pigment patterns among species shown to be caused by changes in cis- and trans-regulation. Currently, much more is known about the cis-regulatory component of divergent yellow expression than the trans-regulatory component, in part because very few trans-acting regulators of yellow expression have been identified. This study aims to improve our understanding of the trans-acting control of yellow expression by combining yeast-one-hybrid and RNAi screens for transcription factors binding to yellow cis-regulatory sequences and affecting abdominal pigmentation in adults, respectively. Of the 670 transcription factors included in the yeast-one-hybrid screen, 45 showed evidence of binding to one or more sequence fragments tested from the 5′ intergenic and intronic yellow sequences from D. melanogaster, D. pseudoobscura, and D. willistoni, suggesting that they might be direct regulators of yellow expression. Of the 670 transcription factors included in the yeast-one-hybrid screen, plus another TF previously shown to be genetically upstream of yellow, 125 were also tested using RNAi, and 32 showed altered abdominal pigmentation. Nine transcription factors were identified in both screens, including four nuclear receptors related to ecdysone signaling (Hr78, Hr38, Hr46, and Eip78C). This finding suggests that yellow expression might be directly controlled by nuclear receptors influenced by ecdysone during early pupal development when adult pigmentation is forming.
SENTRA, available via URL http://wit.mcs.anl.gov/WIT2/Sentra/, is a database of proteins associated with microbial signal transduction. The database currently includes the classical two-component signal transduction pathway proteins and methyl-accepting chemotaxis proteins, but will be expanded to also include other classes of signal transduction systems that are modulated by phosphorylation or methylation reactions. Although the majority of database entries are from prokaryotic systems, eukaroytic proteins with bacterial-like signal transduction domains are also included. Currently SENTRA contains signal transduction proteins in 34 complete and almost completely sequenced prokaryotic genomes, as well as sequences from 243 organisms available in public databases (SWISS-PROT and EMBL). The analysis was carried out within the framework of the WIT2 system, which is designed and implemented to support genetic sequence analysis and comparative analysis of sequenced genomes.
The short length and high degeneracy of sites recognized by DNA-binding transcription factors limit the amount of information they can carry, and individual sites are rarely sufficient to mediate the regulation of specific targets. Computational analysis of microbial genomes has suggested that many factors function optimally when in a particular orientation and position with respect to their target promoters. To investigate this further, we developed and trained spatial models of binding site positioning and applied them to the genome of the yeast Saccharomyces cerevisiae. We found evidence of non-random organization of sites within promoters, differences in binding site density, or both for thirty-eight transcription factors. We show that these signatures allow transcription factors with substantial differences in binding site specificity to share similar promoter specificities. We illustrate how spatial information dictating the positioning and density of binding sites can in principle increase the information available to the organism for differentiating a transcription factor’s true targets, and we indicate how this information could potentially be leveraged for the same purpose in bioinformatic analyses.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.