To understand the impact of gut microbes on human health and well-being it is crucial to assess their genetic potential. Here we describe the Illumina-based metagenomic sequencing, assembly and characterization of 3.3 million non-redundant microbial genes, derived from 576.7 gigabases of sequence, from faecal samples of 124 European individuals. The gene set, approximately 150 times larger than the human gene complement, contains an overwhelming majority of the prevalent (more frequent) microbial genes of the cohort and probably includes a large proportion of the prevalent human intestinal microbial genes. The genes are largely shared among individuals of the cohort. Over 99% of the genes are bacterial, indicating that the entire cohort harbours between 1,000 and 1,150 prevalent bacterial species and each individual at least 160 such species, which are also largely shared. We define and describe the minimal gut metagenome and the minimal gut bacterial genome in terms of functions present in all individuals and most bacteria, respectively.
Dysregulated expression of microRNAs (miRNAs) in various tissues has been associated with a variety of diseases, including cancers. Here we demonstrate that miRNAs are present in the serum and plasma of humans and other animals such as mice, rats, bovine fetuses, calves, and horses. The levels of miRNAs in serum are stable, reproducible, and consistent among individuals of the same species. Employing Solexa, we sequenced all serum miRNAs of healthy Chinese subjects and found over 100 and 91 serum miRNAs in male and female subjects, respectively. We also identified specific expression patterns of serum miRNAs for lung cancer, colorectal cancer, and diabetes, providing evidence that serum miRNAs contain fingerprints for various diseases. Two non-small cell lung cancer-specific serum miRNAs obtained by Solexa were further validated in an independent trial of 75 healthy donors and 152 cancer patients, using quantitative reverse transcription polymerase chain reaction assays. Through these analyses, we conclude that serum miRNAs can serve as potential biomarkers for the detection of various cancers and other diseases.
Summary: SOAP2 is a significantly improved version of the short oligonucleotide alignment program that both reduces computer memory usage and increases alignment speed at an unprecedented rate. We used a Burrows Wheeler Transformation (BWT) compression index to substitute the seed strategy for indexing the reference sequence in the main memory. We tested it on the whole human genome and found that this new algorithm reduced memory usage from 14.7 to 5.4 GB and improved alignment speed by 20–30 times. SOAP2 is compatible with both single- and paired-end reads. Additionally, this tool now supports multiple text and compressed file formats. A consensus builder has also been developed for consensus assembly and SNP detection from alignment of short reads on a reference genome. Availability: http://soap.genomics.org.cn Contact: soap@genomics.org.cn
Next-generation massively parallel DNA sequencing technologies provide ultrahigh throughput at a substantially lower unit data cost; however, the data are very short read length sequences, making de novo assembly extremely challenging. Here, we describe a novel method for de novo assembly of large genomes from short read sequences. We successfully assembled both the Asian and African human genome sequences, achieving an N50 contig size of 7.4 and 5.9 kilobases (kb) and scaffold of 446.3 and 61.9 kb, respectively. The development of this de novo short read assembly method creates new opportunities for building reference sequences and carrying out accurate analyses of unexplored genomes in a cost-effective way.
The genome of the mesopolyploid crop species Brassica rapaThe Brassica rapa Genome Sequencing Project Consortium 1 Abstract:The Brassicaceae family which includes Arabidopsis thaliana, is a natural priority for reaching beyond botanical models to more deeply sample angiosperm genomic and functional diversity. Here we report the draft genome sequence and its annoation of Brassica rapa, one of the two ancestral species of oilseed rape. We modeled 41,174 protein-coding genes in the B. rapa genome. B. rapa has experienced only the second genome triplication reported to date, with its close relationship to A. thaliana providing a useful outgroup for investigating many consequences of triplication for its structural and functional evolution. The extent of gene loss (fractionation) among triplicated genome segments varies, with one copy containing a greater proportion of genes expected to have been present in its ancestor (70%) than the remaining two (46% and 36%). Both a generally rapid evolutionary rate, and specific copy number amplifications of particular gene families, may contribute to the remarkable propensity of Brassica species for the development of new morphological variants. The B. rapa genome provides a new resource for comparative and evolutionary analysis of the Brassicaceae genomes and also a platform for genetic improvement of Brassica oil and vegetable crops.2
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.