Streptococcus pneumoniae is among the most significant causes of bacterial disease in humans. Here we report the 2,038,615-bp genomic sequence of the gram-positive bacterium S. pneumoniae R6. Because the R6 strain is avirulent and, more importantly, because it is readily transformed with DNA from homologous species and many heterologous species, it is the principal platform for investigation of the biology of this important pathogen. It is also used as a primary vehicle for genomics-based development of antibiotics for gram-positive bacteria. In our analysis of the genome, we identified a large number of new uncharacterized genes predicted to encode proteins that either reside on the surface of the cell or are secreted. Among those proteins there may be new targets for vaccine and antibiotic development.
SNPs that are molecularly very close (<10kb) will generally have extremely low recombination rates, much less than 10(-4). Multiple haplotypes will often exist because of the history of the origins of the variants at the different sites, rare recombinants, and the vagaries of random genetic drift and/or selection. Such multiallelic haplotype loci are potentially important in forensic work for individual identification, for defining ancestry, and for identifying familial relationships. The new DNA sequencing capabilities currently available make possible continuous runs of a few hundred base pairs so that we can now determine the allelic combination of multiple SNPs on each chromosome of an individual, i.e., the phase, for multiple SNPs within a small segment of DNA. Therefore, we have begun to identify regions, encompassing two to four SNPs with an extent of<200bp that define multiallelic haplotype loci. We have identified candidate regions and have collected pilot data on many candidate microhaplotype loci. Here we present 31 microhaplotype loci that have at least three alleles, have high heterozygosity, are globally informative, and are statistically independent at the population level. This study of microhaplotype loci (microhaps) provides proof of principle that such markers exist and validates their usefulness for ancestry inference, lineage-clan-family inference, and individual identification. The true value of microhaplotypes will come with sequencing methods that can establish alleles unambiguously, including disentangling of mixtures, because a single sequencing run on a single strand of DNA will encompass all of the SNPs.
MUC-2, the first described intestinal mucin gene, has become important as a prototype for secreted mucins in several organ systems. However, little is known about its protein backbone structure and hence its role in diseases such as colon cancer, ulcerative colitis, and cystic fibrosis, which are known to have mucin abnormalities. Studies in this manuscript show that MUC-2 contains two distinct regions with a high degree of internal homology, but the two regions bear no significant homology to each other. Region 1 consists mostly of48-bp repeats which are interrupted in places by 21-24-bp segments. Several of these interrupting sequences show similarity to each other, creating larger composite repeat units. Region 1 has no length polymorphisms. Region 2 is composed of69-bp tandem repeats arranged in an uninterrupted array of up to 115 individual units. Southern analysis of genomic DNA samples using TaqI and HinfI reveals both length and sequence polymorphisms which occur within region 2. The sequence polymorphisms have different ethnic distributions, while the length polymorphisms are due to variable numbers of tandem repeats. (J. Clin. Invest.
Insights into the human mitochondrial phylogeny have been primarily achieved by sequencing full mitochondrial genomes (mtGenomes). In forensic genetics (partial) mtGenome information can be used to assign haplotypes to their phylogenetic backgrounds, which may, in turn, have characteristic geographic distributions that would offer useful information in a forensic case. In addition and perhaps even more relevant in the forensic context, haplogroup-specific patterns of mutations form the basis for quality control of mtDNA sequences. The current method for establishing (partial) mtDNA haplotypes is Sanger-type sequencing (STS), which is laborious, time-consuming, and expensive. With the emergence of Next Generation Sequencing (NGS) technologies, the body of available mtDNA data can potentially be extended much more quickly and cost-efficiently. Customized chemistries, laboratory workflows and data analysis packages could support the community and increase the utility of mtDNA analysis in forensics. We have evaluated the performance of mtGenome sequencing using the Personal Genome Machine (PGM) and compared the resulting haplotypes directly with conventional Sanger-type sequencing. A total of 64 mtGenomes (>1 million bases) were established that yielded high concordance with the corresponding STS haplotypes (<0.02% differences). About two-thirds of the differences were observed in or around homopolymeric sequence stretches. In addition, the sequence alignment algorithm employed to align NGS reads played a significant role in the analysis of the data and the resulting mtDNA haplotypes. Further development of alignment software would be desirable to facilitate the application of NGS in mtDNA forensic genetics.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.