BackgroundNext-generation sequencing (NGS) technology has transformed metagenomics because the high-throughput data allow an in-depth exploration of a complex microbial community. However, accurate species identification with NGS data is challenging because NGS sequences are relatively short. Assembling 16S rDNA segments into longer sequences has been proposed for improving species identification. Current approaches, however, either suffer from amplification bias due to one single primer or insufficient 16S rDNA reads in whole genome sequencing data.ResultsMultiple primers were used to amplify different 16S rDNA segments for 454 sequencing, followed by 454 read classification and assembly. This permitted targeted sequencing while reducing primer bias. For test samples containing four known bacteria, accurate and near full-length 16S rDNAs of three known bacteria were obtained. For real soil and sediment samples containing dioxins in various concentrations, 16S rDNA sequences were lengthened by 50% for about half of the non-rare microbes, and 16S rDNAs of several microbes reached more than 1000 bp. In addition, reduced primer bias using multiple primers was illustrated.ConclusionsA new experimental and computational pipeline for obtaining long 16S rDNA sequences was proposed. The capability of the pipeline was validated on test samples and illustrated on real samples. For dioxin-containing samples, the pipeline revealed several microbes suitable for future studies of dioxin chemistry.
EGFR genotyping is required for targeted therapy of lung adenocarcinoma. Because a false-negative result might prevent a patient from receiving appropriate targeted therapies, it is desirable to recheck equivocal results of EGFR genotyping. A cohort of 346 lung cancers was tested with a commercial kit for EGFR mutations; nine of the cases had upward real-time amplification curves at late cycles. They were also investigated using mutant-enriched PCR with peptide nucleic acid-locked nucleic acid (PNA-sequencing). Six of the nine equivocal cases harbored EGFR mutations. These cases likely had a small amount of mutant DNA near the detection limit of the commercial kit. Twenty nonequivocal, wild-type cases were reconfirmed using PNA-sequencing. We noticed a College of American Pathologists proficiency test material that showed a suspicious upward curve and eventually proved to have an H773_V774insPH in exon 20, for which a specific primer was not designed in the commercial kit. Further study using cloned DNA fragments showed that the upward curve most likely resulted from cross-reaction between similar, but nonidentical, sequences. It is desirable to keep the number of false-negative results as low as possible, but rechecking all wild-type cases is impractical. The late upward curves we observed helped identify suspicious cases for rechecking. A second method, such as PNA-sequencing, is recommended to verify wild-type cases.
A total of 242 isolates were recovered from 76 patients with invasive diseases, 89 with scarlet fever, and 77 with pharyngitis. The most frequent emm types were types 12 (43.4%), 4 (18.2%), and 1 (16.9%). emm12 reemerged in 2005 and peaked in 2007. emm11 was recovered only from patients with invasive disease.
BackgroundT cells and B cells are essential in the adaptive immunity via expressing T cell receptors and immunoglogulins respectively for recognizing antigens. To recognize a wide variety of antigens, a highly diverse repertoire of receptors is generated via complex recombination of the receptor genes. Reasonably, frequencies of the recombination events have been shown to predict immune diseases and provide insights into the development of immunity. The field is further boosted by high-throughput sequencing and several computational tools have been released to analyze the recombined sequences. However, all current tools assume regular recombination of the receptor genes, which is not always valid in data prepared using a RACE approach. Compared to the traditional multiplex PCR approach, RACE is free of primer bias, therefore can provide accurate estimation of recombination frequencies. To handle the non-regular recombination events, a new computational program is needed.ResultsWe propose TRIg to handle non-regular T cell receptor and immunoglobulin sequences. Unlike all current programs, TRIg does alignments to the whole receptor gene instead of only to the coding regions. This brings new computational challenges, e.g., ambiguous alignments due to multiple hits to repetitive regions. To reduce ambiguity, TRIg applies a heuristic strategy and incorporates gene annotation to identify authentic alignments. On our own and public RACE datasets, TRIg correctly identified non-regularly recombined sequences, which could not be achieved by current programs. TRIg also works well for regularly recombined sequences.ConclusionsTRIg takes into account non-regular recombination of T cell receptor and immunoglobulin genes, therefore is suitable for analyzing RACE data. Such analysis will provide accurate estimation of recombination events, which will benefit various immune studies directly. In addition, TRIg is suitable for studying aberrant recombination in immune diseases. TRIg is freely available at https://github.com/TLlab/trig.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-016-1304-2) contains supplementary material, which is available to authorized users.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.