Xiao Zhu scite author profile

Xiao Zhu

3Publications

63Citation Statements Received

129Citation Statements Given

How they've been cited

How they cite others

129

Affiliations

Guangdong Medical College, Harbin Institute of Technology, Harbin Normal University

Publications

Order By: Most citations

Whole genome sequence of the Treponema pallidum subsp. pallidum strain Amoy: An Asian isolate highly similar to SS14

et al. 2017

View full text Add to dashboard Cite

Treponema pallidum ssp. pallidum (T. pallidum), the causative agent of the sexually transmitted disease syphilis, is an uncultivatable human pathogen. The geographical differences in T. pallidum genomes leading to differences in pathogenicity are not yet understood. Presently, twelve T. pallidum genomes are available to the public, all of which are American in origin and often co-infect patients with human immunodeficiency virus (HIV). In this study, we examined the T. pallidum subsp. pallidum strain Amoy, a syphilis pathogen found in Xiamen, China. We sequenced its genome using Illumina next-generation sequencing technology and obtained a nearly (98.83%) complete genome of approximately 1.12 Mbps. The new genome shows good synteny with its five T. pallidum sibling strains (Nichols, SS14, Mexico A, DAL-1, and Chicago), among which SS14 is the strain closest to the Amoy strain. Compared with strain SS14, the Amoy strain possesses four uncharacterized strain-specific genes and is likely missing six genes, including a gene encoding the TPR domain protein, which may partially account for the comparatively low virulence and toxicity of the Amoy strain in animal infection. Notably, we did not detect the 23S rRNA A2058G/A2059G mutation in the Amoy strain, which likely explains the sensitivity of Amoy strain to macrolides. The results of this study will lead to a better understanding of the pathogenesis of syphilis and the geographical distribution of T. pallidum genotypes.

show abstract

PERGA: A Paired-End Read Guided De Novo Assembler for Extending Contigs Using SVM and Look Ahead Approach

et al. 2014

View full text Add to dashboard Cite

Since the read lengths of high throughput sequencing (HTS) technologies are short, de novo assembly which plays significant roles in many applications remains a great challenge. Most of the state-of-the-art approaches base on de Bruijn graph strategy and overlap-layout strategy. However, these approaches which depend on k-mers or read overlaps do not fully utilize information of paired-end and single-end reads when resolving branches. Since they treat all single-end reads with overlapped length larger than a fix threshold equally, they fail to use the more confident long overlapped reads for assembling and mix up with the relative short overlapped reads. Moreover, these approaches have not been special designed for handling tandem repeats (repeats occur adjacently in the genome) and they usually break down the contigs near the tandem repeats. We present PERGA (Paired-End Reads Guided Assembler), a novel sequence-reads-guided de novo assembly approach, which adopts greedy-like prediction strategy for assembling reads to contigs and scaffolds using paired-end reads and different read overlap size ranging from O max to O min to resolve the gaps and branches. By constructing a decision model using machine learning approach based on branch features, PERGA can determine the correct extension in 99.7% of cases. When the correct extension cannot be determined, PERGA will try to extend the contig by all feasible extensions and determine the correct extension by using look-ahead approach. Many difficult-resolved branches are due to tandem repeats which are close in the genome. PERGA detects such different copies of the repeats to resolve the branches to make the extension much longer and more accurate. We evaluated PERGA on both Illumina real and simulated datasets ranging from small bacterial genomes to large human chromosome, and it constructed longer and more accurate contigs and scaffolds than other state-of-the-art assemblers. PERGA can be freely downloaded at https://github.com/hitbio/PERGA.

show abstract

misFinder: identify mis-assemblies in an unbiased manner using reference and paired-end reads

et al. 2015

View full text Add to dashboard Cite

BackgroundBecause of the short read length of high throughput sequencing data, assembly errors are introduced in genome assembly, which may have adverse impact to the downstream data analysis. Several tools have been developed to eliminate these errors by either 1) comparing the assembled sequences with some similar reference genome, or 2) analyzing paired-end reads aligned to the assembled sequences and determining inconsistent features alone mis-assembled sequences. However, the former approach cannot distinguish real structural variations between the target genome and the reference genome while the latter approach could have many false positive detections (correctly assembled sequence being considered as mis-assembled sequence).ResultsWe present misFinder, a tool that aims to identify the assembly errors with high accuracy in an unbiased way and correct these errors at their mis-assembled positions to improve the assembly accuracy for downstream analysis. It combines the information of reference (or close related reference) genome and aligned paired-end reads to the assembled sequence. Assembly errors and correct assemblies corresponding to structural variations can be detected by comparing the genome reference and assembled sequence. Different types of assembly errors can then be distinguished from the mis-assembled sequence by analyzing the aligned paired-end reads using multiple features derived from coverage and consistence of insert distance to obtain high confident error calls.ConclusionsWe tested the performance of misFinder on both simulated and real paired-end reads data, and misFinder gave accurate error calls with only very few miscalls. And, we further compared misFinder with QUAST and REAPR. misFinder outperformed QUAST and REAPR by 1) identified more true positive mis-assemblies with very few false positives and false negatives, and 2) distinguished the correct assemblies corresponding to structural variations from mis-assembled sequence. misFinder can be freely downloaded from https://github.com/hitbio/misFinder.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-015-0818-3) contains supplementary material, which is available to authorized users.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Xiao Zhu

Whole genome sequence of the Treponema pallidum subsp. pallidum strain Amoy: An Asian isolate highly similar to SS14

PERGA: A Paired-End Read Guided De Novo Assembler for Extending Contigs Using SVM and Look Ahead Approach

misFinder: identify mis-assemblies in an unbiased manner using reference and paired-end reads

Contact Info

Product

Resources

About