Jessica Zhang scite author profile

Motivation In the modern genomics era, genome sequence assemblies are routine practice. However, depending on the methodology, resulting drafts may contain considerable base errors. Although utilities exist for genome base polishing, they work best with high read coverage and do not scale well. We developed ntEdit, a Bloom filter-based genome sequence editing utility that scales to large mammalian and conifer genomes. Results We first tested ntEdit and the state-of-the-art assembly improvement tools GATK, Pilon and Racon on controlled Escherichia coli and Caenorhabditis elegans sequence data. Generally, ntEdit performs well at low sequence depths (<20×), fixing the majority (>97%) of base substitutions and indels, and its performance is largely constant with increased coverage. In all experiments conducted using a single CPU, the ntEdit pipeline executed in <14 s and <3 m, on average, on E.coli and C.elegans, respectively. We performed similar benchmarks on a sub-20× coverage human genome sequence dataset, inspecting accuracy and resource usage in editing chromosomes 1 and 21, and whole genome. ntEdit scaled linearly, executing in 30–40 m on those sequences. We show how ntEdit ran in <2 h 20 m to improve upon long and linked read human genome assemblies of NA12878, using high-coverage (54×) Illumina sequence data from the same individual, fixing frame shifts in coding sequences. We also generated 17-fold coverage spruce sequence data from haploid sequence sources (seed megagametophyte), and used it to edit our pseudo haploid assemblies of the 20 Gb interior and white spruce genomes in <4 and <5 h, respectively, making roughly 50M edits at a (substitution+indel) rate of 0.0024. Availability and implementation https://github.com/bcgsc/ntedit Supplementary information Supplementary data are available at Bioinformatics online.

show abstract

Spitzer Matching Survey of the UltraVISTA Ultra-deep Stripes (SMUVS): Full-mission IRAC Mosaics and Catalogs

Ashby¹,

Caputi

Cowley

et al. 2018

ApJS

View full text Add to dashboard Cite

This paper describes new deep 3.6 and 4.5 µm imaging of three UltraVISTA near-infrared survey stripes within the COSMOS field. The observations were carried out with Spitzer's Infrared Array Camera (IRAC) for the Spitzer Matching Survey of the Ultra-VISTA Deep Stripes (SMUVS). In this work we present our data reduction techniques, and document the resulting mosaics, coverage maps, and catalogs in both IRAC passbands for the three easternmost UltraVISTA survey stripes, covering a combined area of about 0.66 deg 2 , of which 0.45 deg 2 have at least 20 hr integration time. SMUVS reaches point-source sensitivities of about 25.0 AB mag (0.13 µJy) at both 3.6 and 4.5 µm with a significance of 4σ, accounting for both survey sensitivity and source confusion. To this limit the SMUVS catalogs contain a total of ∼350,000 sources, each of which is detected significantly in at least one IRAC band. Because of its uniform and high sensitivity, relatively large area coverage, and the wide array of ancillary data available in COSMOS, the SMUVS survey will be useful for a large number of cosmological investigations. We make all images and catalogues described herein publicly available via the Spitzer Science Center.

show abstract

ARKS: chromosome-scale scaffolding of human genome drafts with linked read kmers

et al. 2018

View full text Add to dashboard Cite

BackgroundThe long-range sequencing information captured by linked reads, such as those available from 10× Genomics (10xG), helps resolve genome sequence repeats, and yields accurate and contiguous draft genome assemblies. We introduce ARKS, an alignment-free linked read genome scaffolding methodology that uses linked reads to organize genome assemblies further into contiguous drafts. Our approach departs from other read alignment-dependent linked read scaffolders, including our own (ARCS), and uses a kmer-based mapping approach. The kmer mapping strategy has several advantages over read alignment methods, including better usability and faster processing, as it precludes the need for input sequence formatting and draft sequence assembly indexing. The reliance on kmers instead of read alignments for pairing sequences relaxes the workflow requirements, and drastically reduces the run time.ResultsHere, we show how linked reads, when used in conjunction with Hi-C data for scaffolding, improve a draft human genome assembly of PacBio long-read data five-fold (baseline vs. ARKS NG50 = 4.6 vs. 23.1 Mbp, respectively). We also demonstrate how the method provides further improvements of a megabase-scale Supernova human genome assembly (NG50 = 14.74 Mbp vs. 25.94 Mbp before and after ARKS), which itself exclusively uses linked read data for assembly, with an execution speed six to nine times faster than competitive linked read scaffolders (~ 10.5 h compared to 75.7 h, on average). Following ARKS scaffolding of a human genome 10xG Supernova assembly (of cell line NA12878), fewer than 9 scaffolds cover each chromosome, except the largest (chromosome 1, n = 13).ConclusionsARKS uses a kmer mapping strategy instead of linked read alignments to record and associate the barcode information needed to order and orient draft assembly sequences. The simplified workflow, when compared to that of our initial implementation, ARCS, markedly improves run time performances on experimental human genome datasets. Furthermore, the novel distance estimator in ARKS utilizes barcoding information from linked reads to estimate gap sizes. It accomplishes this by modeling the relationship between known distances of a region within contigs and calculating associated Jaccard indices. ARKS has the potential to provide correct, chromosome-scale genome assemblies, promptly. We expect ARKS to have broad utility in helping refine draft genomes.Electronic supplementary materialThe online version of this article (10.1186/s12859-018-2243-x) contains supplementary material, which is available to authorized users.

show abstract

ARKS: chromosome-scale scaffolding of human genome drafts with linked read kmers

Coombe

Zhang

Vandervalk

et al. 2018

Preprint

View full text Add to dashboard Cite

Background. The long-range sequencing information captured by linked reads, such as those available from 10x Genomics (10xG), helps resolve genome sequence repeats, and yields accurate and contiguous draft genome assemblies. We introduce ARKS, an alignment-free linked read genome scaffolding methodology that uses linked reads to organize genome assemblies further into contiguous drafts. Our approach departs from other read alignment-dependent linked read scaffolders, including our own (ARCS), and uses a kmer-based mapping approach. The kmer mapping strategy has several advantages over read alignment methods, including better usability and faster processing, as it precludes the need for input sequence formatting and draft sequence assembly indexing. The reliance on kmers instead of read alignments for pairing sequences relaxes the workflow requirements, and drastically reduces the run time.Results. Here, we show how linked reads, when used in conjunction with Hi-C data for scaffolding, improve a draft human genome assembly of PacBio long-read data five-fold (baseline vs. ARKS NG50=4.6 vs. 23.1 Mbp, respectively). We also demonstrate how the method provides further improvements of a megabase-scale Supernova human genome assembly, which itself exclusively uses linked read data for assembly, with an execution speed six to nine times faster than competitive linked read scaffolders. Following ARKS scaffolding of a human genome 10xG Supernova assembly (of cell line NA12878), fewer than 9 scaffolds cover each chromosome, except the largest (chromosome 1, n=13). Conclusions. ARKS uses a kmer mapping strategy instead of linked read alignments to record and associate the barcode information needed to order and orient draft assembly sequences. The simplified workflow, when compared to that of our initial implementation, ARCS, markedly improves run time performances on experimental human genome datasets. Furthermore, ARKS utilizes barcoding information from linked reads to estimate gap size. It accomplishes this by modeling the relationship between known distances of a region within contigs and calculating associated Jaccard indices. ARKS has the potential to provide correct, chromosome-scale, genome assemblies, promptly. We expect ARKS to have broad utility in helping refine draft genomes.

show abstract

Vertebral Body Tethering in 49 Adolescent Patients after Peak Height Velocity for the Treatment of Idiopathic Scoliosis: 2–5 Year Follow-Up

Meyers

Eaker

Zhang

et al. 2022

JCM

View full text Add to dashboard Cite

Vertebral Body Tethering (VBT) is a non-fusion surgical treatment for Adolescent Idiopathic Scoliosis (AIS) that elicits correction via growth modulation in skeletally immature patients. VBT after peak height velocity is controversial and is the subject of this study. A retrospective review of Risser 3–5 AIS patients treated with VBT, and min. 2-year FU was performed. Pre to post-op changes in clinical outcomes were compared using Student’s t-test or the Mann-Whitney test. A total of 49 patients met criteria, age 15.0 ± 1.9 years, FU 32.5 ± 9.1 months. For thoracic (T) major curvatures, T curvature improved from 51.1 ± 6.9° to 27.2° ± 8.1° (p < 0.01) and TL from 37.2° ± 10.7° to 19.2° ± 6.8° (p < 0.01). For thoracolumbar (TL) major curvatures, T improved from 37.2° ± 10.7° to 18.8° ± 9.4° (p < 0.01) and TL from 49.0° ± 6.4° to 20.1° ± 8.5° (p < 0.01). Major curve inclinometer measurements and SRS-22 domains, except activity, improved significantly (p ≤ 0.05). At the latest FU, one (2%) patient required fusion of the T curve and revision of the TL tether due to curve progression in the previously uninstrumented T curve and tether breakage (TB) in the TL. Twenty (41%) patients experienced TB. VBT in AIS patients with limited remaining skeletal growth resulted in satisfactory clinical outcomes at the latest FU.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Jessica Zhang

ntEdit: scalable genome sequence polishing

Spitzer Matching Survey of the UltraVISTA Ultra-deep Stripes (SMUVS): Full-mission IRAC Mosaics and Catalogs

ARKS: chromosome-scale scaffolding of human genome drafts with linked read kmers

ARKS: chromosome-scale scaffolding of human genome drafts with linked read kmers

Vertebral Body Tethering in 49 Adolescent Patients after Peak Height Velocity for the Treatment of Idiopathic Scoliosis: 2–5 Year Follow-Up

Contact Info

Product

Resources

About