The interactions between aphids and their host plants seem to be analogous to those of plant-microbial pathogens. Unlike microbial pathogen effectors, little is known about aphid effectors and their ability to interfere with host immunity. To date, only three functional aphid effectors have been reported. To identify potato aphid (Macrosiphum euphorbiae) effectors, we developed a salivary gland transcriptome using Illumina technology. We generated 85 million Illumina reads from salivary glands and assembled them into 646 contigs. Ab initio sequence analysis predicted secretion signal peptides in 24% of these sequences, suggesting that they might be secreted into the plant during aphid feeding. Eight of these candidate effectors with secretion signal peptides were functionally characterized using Agrobacterium tumefaciens-mediated transient overexpression in Nicotiana benthamiana. Two candidate effectors, Me10 and Me23, increased aphid fecundity, suggesting their ability to suppress N. benthamiana defenses. Five of these candidate effectors, including Me10 and Me23, were also analyzed in tomato by delivering them through the Pseudomonas syringae type three secretion system. In tomato, only Me10 increased aphid fecundity. This work identified two additional aphid effectors with ability to manipulate the host for their advantage.
Motivation: Similarity clustering of next-generation sequences (NGS) is an important computational problem to study the population sizes of DNA/RNA molecules and to reduce the redundancies in NGS data. Currently, most sequence clustering algorithms are limited by their speed and scalability, and thus cannot handle data with tens of millions of reads.Results: Here, we introduce SEED—an efficient algorithm for clustering very large NGS sets. It joins sequences into clusters that can differ by up to three mismatches and three overhanging residues from their virtual center. It is based on a modified spaced seed method, called block spaced seeds. Its clustering component operates on the hash tables by first identifying virtual center sequences and then finding all their neighboring sequences that meet the similarity parameters. SEED can cluster 100 million short read sequences in <4 h with a linear time and memory performance. When using SEED as a preprocessing tool on genome/transcriptome assembly data, it was able to reduce the time and memory requirements of the Velvet/Oasis assembler for the datasets used in this study by 60–85% and 21–41%, respectively. In addition, the assemblies contained longer contigs than non-preprocessed data as indicated by 12–27% larger N50 values. Compared with other clustering tools, SEED showed the best performance in generating clusters of NGS data similar to true cluster results with a 2- to 10-fold better time performance. While most of SEED's utilities fall into the preprocessing area of NGS data, our tests also demonstrate its efficiency as stand-alone tool for discovering clusters of small RNA sequences in NGS data from unsequenced organisms.Availability: The SEED software can be downloaded for free from this site: http://manuals.bioinformatics.ucr.edu/home/seed.Contact: thomas.girke@ucr.eduSupplementary information: Supplementary data are available at Bioinformatics online
Supplementary data are available at Bioinformatics online.
BackgroundThe third generation PacBio SMRT long reads can effectively address the read length issue of the second generation sequencing technology, but contain approximately 15% sequencing errors. Several error correction algorithms have been designed to efficiently reduce the error rate to 1%, but they discard large amounts of uncorrected bases and thus lead to low throughput. This loss of bases could limit the completeness of downstream assemblies and the accuracy of analysis.ResultsHere, we introduce HALC, a high throughput algorithm for long read error correction. HALC aligns the long reads to short read contigs from the same species with a relatively low identity requirement so that a long read region can be aligned to at least one contig region, including its true genome region’s repeats in the contigs sufficiently similar to it (similar repeat based alignment approach). It then constructs a contig graph and, for each long read, references the other long reads’ alignments to find the most accurate alignment and correct it with the aligned contig regions (long read support based validation approach). Even though some long read regions without the true genome regions in the contigs are corrected with their repeats, this approach makes it possible to further refine these long read regions with the initial insufficient short reads and correct the uncorrected regions in between. In our performance tests on E. coli, A. thaliana and Maylandia zebra data sets, HALC was able to obtain 6.7-41.1% higher throughput than the existing algorithms while maintaining comparable accuracy. The HALC corrected long reads can thus result in 11.4-60.7% longer assembled contigs than the existing algorithms.ConclusionsThe HALC software can be downloaded for free from this site: https://github.com/lanl001/halc.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-017-1610-3) contains supplementary material, which is available to authorized users.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.