Chengran Zhou scite author profile

Insects are the most speciose group of animals, but the phylogenetic relationships of many major lineages remain unresolved. We inferred the phylogeny of insects from 1478 protein-coding genes. Phylogenomic analyses of nucleotide and amino acid sequences, with site-specific nucleotide or domain-specific amino acid substitution models, produced statistically robust and congruent results resolving previously controversial phylogenetic relations hips. We dated the origin of insects to the Early Ordovician [~479 million years ago (Ma)], of insect flight to the Early Devonian (~406 Ma), of major extant lineages to the Mississippian (~345 Ma), and the major diversification of holometabolous insects to the Early Cretaceous. Our phylogenomic study provides a comprehensive reliable scaffold for future comparative analyses of evolutionary innovations among insects.

show abstract

SOAPBarcode: revealing arthropod biodiversity through assembly of Illumina shotgun sequences of PCR amplicons

Liu

et al. 2013

Methods Ecol Evol

View full text Add to dashboard Cite

Summary1. Metabarcoding of mixed arthropod samples for biodiversity assessment has mostly been carried out on the 454 GS FLX sequencer (Roche, Branford, Connecticut, USA), due to its ability to produce long reads (≥400 bp) that are believed to allow higher taxonomic resolution. The Illumina sequencing platforms, with their much higher throughputs, could potentially reduce sequencing costs and improve sequence quality, but the associated shorter read length (typically <150 bp) has deterred their usage in next-generation-sequencing (NGS)-based analyses of eukaryotic biodiversity, which often utilize standard barcode markers (e.g. COI, rbcL, matK, ITS) that are hundreds of nucleotides long. 2. We present a new Illumina-based pipeline to recover full-length COI barcodes from mixed arthropod samples. Our new assembly program, SOAPBarcode, a variant of the genome assembly program SOAPdenovo, uses paired-end reads of the standard COI barcode region as anchors to extract the correct pathways (sequences) out of otherwise chaotic 'de Bruijn graphs', which are caused by the presence of large numbers of COI homologs of high sequence similarity. 3. Two bulk insect samples of known species composition have been analysed in a recently published 454 metabarcoding study (Yu et al. 2012) and are re-analysed by our analysis pipeline. Compared to the results of Roche 454 (c. 400-bp reads), our pipeline recovered full-length COI barcodes (658 bp) and 17-31% more species-level operational taxonomic units (OTUs) from bulk insect samples, with fewer untraceable (novel) OTUs. On the other hand, our PCR-based pipeline also revealed higher rates of contamination across samples, due to the Illumina's increased sequencing depth. On balance, the assembled full-length barcodes and increased OTU recovery rates resulted in more resolved taxonomic assignments and more accurate beta diversity estimation. 4. The HiSeq 2000 and the SOAPBarcode pipeline together can achieve more accurate biodiversity assessment at a much reduced sequencing cost in metabarcoding analyses. However, greater precaution is needed to prevent cross-sample contamination during field preparation and laboratory operation because of greater ability to detect non-target DNA amplicons present in low-copy numbers.

show abstract

Evolutionary and biomedical insights from a marmoset diploid genome assembly

Yang

Zhou

Marcus

et al. 2021

Nature

View full text Add to dashboard Cite

The accurate and complete assembly of both haplotype sequences of a diploid organism is essential to understanding the role of variation in genome functions, phenotypes, and diseases1. Here, using a trio-binning approach, we present a high-quality, diploid reference genome, with both haplotypes assembled independently at the chromosome level, for the common marmoset (Callithrix jacchus), an important primate model system widely used in biomedical research2,3. The full heterozygosity spectrum between the two haplotypes involves 1.36% of the genome, much higher than the 0.13% indicated by the standard single nucleotide heterozygosity estimation alone. The de novo mutation rate is 0.43 × 10-8 per site per generation, where the paternal inherited genome acquired twice as many mutations as the maternal. Our diploid assembly enabled us to discover a recent expansion of the sex differentiated region and unique evolutionary changes in the marmoset Y chromosome. Additionally, we identified many genes with signatures of positive selection that might have contributed to the evolution of Callithrix biological features. Brain related genes were highly conserved between marmosets and humans, though several genes experienced lineage-specific copy number variations or diversifying selection, providing important implications for the application of marmosets as a model system.

show abstract

High-coverage genomes to elucidate the evolution of penguins

Pan

Cole

et al. 2019

View full text Add to dashboard Cite

Background Penguins (Sphenisciformes) are a remarkable order of flightless wing-propelled diving seabirds distributed widely across the southern hemisphere. They share a volant common ancestor with Procellariiformes close to the Cretaceous-Paleogene boundary (66 million years ago) and subsequently lost the ability to fly but enhanced their diving capabilities. With ∼20 species among 6 genera, penguins range from the tropical Galápagos Islands to the oceanic temperate forests of New Zealand, the rocky coastlines of the sub-Antarctic islands, and the sea ice around Antarctica. To inhabit such diverse and extreme environments, penguins evolved many physiological and morphological adaptations. However, they are also highly sensitive to climate change. Therefore, penguins provide an exciting target system for understanding the evolutionary processes of speciation, adaptation, and demography. Genomic data are an emerging resource for addressing questions about such processes. Results Here we present a novel dataset of 19 high-coverage genomes that, together with 2 previously published genomes, encompass all extant penguin species. We also present a well-supported phylogeny to clarify the relationships among penguins. In contrast to recent studies, our results demonstrate that the genus Aptenodytes is basal and sister to all other extant penguin genera, providing intriguing new insights into the adaptation of penguins to Antarctica. As such, our dataset provides a novel resource for understanding the evolutionary history of penguins as a clade, as well as the fine-scale relationships of individual penguin lineages. Against this background, we introduce a major consortium of international scientists dedicated to studying these genomes. Moreover, we highlight emerging issues regarding ensuring legal and respectful indigenous consultation, particularly for genomic data originating from New Zealand Taonga species. Conclusions We believe that our dataset and project will be important for understanding evolution, increasing cultural heritage and guiding the conservation of this iconic southern hemisphere species assemblage.

show abstract

Omics-based interpretation of synergism in a soil-derived cellulose-degrading microbial community

Zhou

Pope

et al. 2014

Sci Rep

View full text Add to dashboard Cite

Reaching a comprehensive understanding of how nature solves the problem of degrading recalcitrant biomass may eventually allow development of more efficient biorefining processes. Here we interpret genomic and proteomic information generated from a cellulolytic microbial consortium (termed F1RT) enriched from soil. Analyses of reconstructed bacterial draft genomes from all seven uncultured phylotypes in F1RT indicate that its constituent microbes cooperate in both cellulose-degrading and other important metabolic processes. Support for cellulolytic inter-species cooperation came from the discovery of F1RT microbes that encode and express complimentary enzymatic inventories that include both extracellular cellulosomes and secreted free-enzyme systems. Metabolic reconstruction of the seven F1RT phylotypes predicted a wider genomic rationale as to how this particular community functions as well as possible reasons as to why biomass conversion in nature relies on a structured and cooperative microbial community.

show abstract

Genomic insights into the secondary aquatic transition of penguins

et al. 2022

View full text Add to dashboard Cite

Penguins lost the ability to fly more than 60 million years ago, subsequently evolving a hyper-specialized marine body plan. Within the framework of a genome-scale, fossil-inclusive phylogeny, we identify key geological events that shaped penguin diversification and genomic signatures consistent with widespread refugia/recolonization during major climate oscillations. We further identify a suite of genes potentially underpinning adaptations related to thermoregulation, oxygenation, diving, vision, diet, immunity and body size, which might have facilitated their remarkable secondary transition to an aquatic ecology. Our analyses indicate that penguins and their sister group (Procellariiformes) have the lowest evolutionary rates yet detected in birds. Together, these findings help improve our understanding of how penguins have transitioned to the marine environment, successfully colonizing some of the most extreme environments on Earth.

show abstract

Comprehensive Genomic Characterization of Campylobacter Genus Reveals Some Underlying Mechanisms for its Genomic Diversification

Zhou

Guo

et al. 2013

PLoS ONE

View full text Add to dashboard Cite

Campylobacter species.are phenotypically diverse in many aspects including host habitats and pathogenicities, which demands comprehensive characterization of the entire Campylobacter genus to study their underlying genetic diversification. Up to now, 34 Campylobacter strains have been sequenced and published in public databases, providing good opportunity to systemically analyze their genomic diversities. In this study, we first conducted genomic characterization, which includes genome-wide alignments, pan-genome analysis, and phylogenetic identification, to depict the genetic diversity of Campylobacter genus. Afterward, we improved the tetranucleotide usage pattern-based naïve Bayesian classifier to identify the abnormal composition fragments (ACFs, fragments with significantly different tetranucleotide frequency profiles from its genomic tetranucleotide frequency profiles) including horizontal gene transfers (HGTs) to explore the mechanisms for the genetic diversity of this organism. Finally, we analyzed the HGTs transferred via bacteriophage transductions. To our knowledge, this study is the first to use single nucleotide polymorphism information to construct liable microevolution phylogeny of 21 Campylobacter jejuni strains. Combined with the phylogeny of all the collected Campylobacter species based on genome-wide core gene information, comprehensive phylogenetic inference of all 34 Campylobacter organisms was determined. It was found that C. jejuni harbors a high fraction of ACFs possibly through intraspecies recombination, whereas other Campylobacter members possess numerous ACFs possibly via intragenus recombination. Furthermore, some Campylobacter strains have undergone significant ancient viral integration during their evolution process. The improved method is a powerful tool for bacterial genomic analysis. Moreover, the findings would provide useful information for future research on Campylobacter genus.

show abstract

Filling reference gaps via assembling DNA barcodes using high-throughput sequencing—moving toward barcoding the world

Liu

Yang

Zhou

et al. 2017

View full text Add to dashboard Cite

Over the past decade, biodiversity researchers have dedicated tremendous efforts to constructing DNA reference barcodes for rapid species registration and identification. Although analytical cost for standard DNA barcoding has been significantly reduced since early 2000, further dramatic reduction in barcoding costs is unlikely because Sanger sequencing is approaching its limits in throughput and chemistry cost. Constraints in barcoding cost not only led to unbalanced barcoding efforts around the globe, but also prevented high-throughput sequencing (HTS)–based taxonomic identification from applying binomial species names, which provide crucial linkages to biological knowledge. We developed an Illumina-based pipeline, HIFI-Barcode, to produce full-length Cytochrome c oxidase subunit I (COI) barcodes from pooled polymerase chain reaction amplicons generated by individual specimens. The new pipeline generated accurate barcode sequences that were comparable to Sanger standards, even for different haplotypes of the same species that were only a few nucleotides different from each other. Additionally, the new pipeline was much more sensitive in recovering amplicons at low quantity. The HIFI-Barcode pipeline successfully recovered barcodes from more than 78% of the polymerase chain reactions that didn’t show clear bands on the electrophoresis gel. Moreover, sequencing results based on the single molecular sequencing platform Pacbio confirmed the accuracy of the HIFI-Barcode results. Altogether, the new pipeline can provide an improved solution to produce full-length reference barcodes at about one-tenth of the current cost, enabling construction of comprehensive barcode libraries for local fauna, leading to a feasible direction for DNA barcoding global biomes.

show abstract

12 3

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Chengran Zhou

Phylogenomics resolves the timing and pattern of insect evolution

SOAPBarcode: revealing arthropod biodiversity through assembly of Illumina shotgun sequences of PCR amplicons

Evolutionary and biomedical insights from a marmoset diploid genome assembly

High-coverage genomes to elucidate the evolution of penguins

Omics-based interpretation of synergism in a soil-derived cellulose-degrading microbial community

Genomic insights into the secondary aquatic transition of penguins

Comprehensive Genomic Characterization of Campylobacter Genus Reveals Some Underlying Mechanisms for its Genomic Diversification

Filling reference gaps via assembling DNA barcodes using high-throughput sequencing—moving toward barcoding the world

Contact Info

Product

Resources

About