The availability of next-generation sequencing (NGS) in recent years has facilitated a revolution in the availability of mitochondrial (mt) genome sequences. The mt genome is a powerful tool for comparative studies and resolving the phylogenetic relationships among insect lineages. The mt genomes of phytophagous scarabs of the subfamilies Cetoniinae and Dynastinae were under-represented in GenBank. Previous research found that the subfamily Rutelinae was recovered as a paraphyletic group because the few representatives of the subfamily Dynastinae clustered into Rutelinae, but the subfamily position of Dynastinae was still unclear. In the present study, we sequenced 18 mt genomes from Dynastinae and Cetoniinae using next-generation sequencing (NGS) to re-assess the phylogenetic relationships within Scarabaeidae. All sequenced mt genomes contained 37 sets of genes (13 protein-coding genes, 22 tRNAs, and two ribosomal RNAs), with one long control region, but the gene order was not the same between Cetoniinae and Dynastinae species. All mt genomes of Dynastinae species showed the same gene rearrangement of trnQ-NCR-trnI-trnM, whereas all mt genomes of Cetoniinae species showed the ancestral insect gene order of trnI-trnQ-trnM. Phylogenetic analyses (IQ-tree and MrBayes) were conducted using 13 protein-coding genes based on nucleotide and amino acid datasets. In the ML and BI trees, we recovered the monophyly of Rutelinae, Cetoniinae, Dynastinae, and Sericinae, and the non-monophyly of Melolonthinae. Cetoniinae was shown to be a sister clade to (Dynastinae + Rutelinae).
Background Short tandem repeat (STR), or “microsatellite”, is a tract of DNA in which a specific motif (typically < 10 base pairs) is repeated multiple times. STRs are abundant throughout the human genome, and specific repeat expansions may be associated with human diseases. Long-read sequencing coupled with bioinformatics tools enables the estimation of repeat counts for STRs. However, with the exception of a few well-known disease-relevant STRs, normal ranges of repeat counts for most STRs in human populations are not well known, preventing the prioritization of STRs that may be associated with human diseases. Results In this study, we extend a computational tool RepeatHMM to infer normal ranges of 432,604 STRs using 21 long-read sequencing datasets on human genomes, and build a genomic-scale database called RepeatHMM-DB with normal repeat ranges for these STRs. Evaluation on 13 well-known repeats show that the inferred repeat ranges provide good estimation to repeat ranges reported in literature from population-scale studies. This database, together with a repeat expansion estimation tool such as RepeatHMM, enables genomic-scale scanning of repeat regions in newly sequenced genomes to identify disease-relevant repeat expansions. As a case study of using RepeatHMM-DB, we evaluate the CAG repeats of ATXN3 for 20 patients with spinocerebellar ataxia type 3 (SCA3) and 5 unaffected individuals, and correctly classify each individual. Conclusions In summary, RepeatHMM-DB can facilitate prioritization and identification of disease-relevant STRs from whole-genome long-read sequencing data on patients with undiagnosed diseases. RepeatHMM-DB is incorporated into RepeatHMM and is available at https://github.com/WGLab/RepeatHMM.
Open chromatin regions (OCRs) are special regions of the human genome that can be accessed by DNA regulatory elements. Several studies have reported that a series of OCRs are associated with mechanisms involved in human diseases, such as cancers. Identifying OCRs using ATAC-seq or DNase-seq is often expensive. It has become popular to detect OCRs from plasma cell-free DNA (cfDNA) sequencing data, because both the fragmentation modes of cfDNA and the sequencing coverage in OCRs are significantly different from those in other regions. However, it is a challenging computational problem to accurately detect OCRs from plasma cfDNA-seq data, as multiple factors—e.g., sequencing and mapping bias, insufficient read depth, etc.—often mislead the computational model. In this paper, we propose a novel bioinformatics pipeline, OCRDetector, for detecting OCRs from whole-genome cfDNA sequencing data. The pipeline calculates the window protection score (WPS) waveform and the cfDNA sequencing coverage. To validate the proposed pipeline, we compared the percentage overlap of our OCRs with those obtained by other methods. The experimental results show that 81% of the TSS regions of housekeeping genes are detected, and our results have obvious tissue specificity. In addition, the overlap percentage between our OCRs and the high-confidence OCRs obtained by ATAC-seq or DNase-seq is greater than 70%.
Ephemeroptera (Insecta: Pterygota) are widely distributed all over the world with more than 3500 species. During the last decade, the phylogenetic relationships within Ephemeroptera have been a hot topic of research, especially regarding the phylogenetic relationships among Vietnamellidae. In this study, three mitochondrial genomes from three populations of Vienamella sinensis collected from Tonglu (V. sinensis TL), Chun’an (V. sinensis CN), and Qingyuan (V. sinensis QY) in Zhejiang Province, China were compared to discuss the potential existence of cryptic species. We also established their phylogenetic relationship by combining the mt genomes of 69 Ephemeroptera downloaded from NCBI. The mt genomes of V. sinensis TL, V. sinensis CN, and V. sinensis QY showed the same gene arrangement with lengths of 15,674 bp, 15,674 bp, and 15,610 bp, respectively. Comprehensive analyses of these three mt genomes revealed significant differences in mt genome organization, genetic distance, and divergence time. Our results showed that the specimens collected from Chun’an and Tonglu in Zhejiang Province, China belonged to V. sinensis, and the specimens collected from Qingyuan, Zhejiang Province, China were a cryptic species of V. sinensis. In maximum likelihood (ML) and Bayesian inference (BI) phylogenetic trees, the monophyly of the family Vietnamellidae was supported and Vietnamellidae has a close relationship with Ephemerellidae.
A DNA microarray was constructed for high-throughput identification of the plant resource of commercial FDSH [Fengdu Shihu (Dendrobium officinale)]. The 5 S rDNA (ribosomal DNA) intergenic spacer region in D. officinale, D. nobile, D. moniliforme, D. hercoglossum, D. williamsonii, D. capillipes, D. wilsonii and D. jenkinsii was amplified by a single primer pair and sequenced. The sequences showed polymorphism. They were incorporated on a glass slide and hybridized with fluorescently labelled 5 S sequences from commercial Shihu. The DNA microarray enabled the differentiation of D. officinale from the other species tested. FDSH could thus be distinguished from its adulterants. It is evident that DNA microarrays provide a high-throughput and reliable approach for the identification of plant resources, and the method presented here is useful for the authentication of FDSH.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.