Huge amount of data in the form of strings are being handled in bio-computing applications and searching algorithms are quite frequently used in them. Many methods utilizing on both software and hardware are being proposed to accelerate processing of such data. The typical hardware-based acceleration techniques either require special hardware such as generalpurpose graphics processing units (GPGPUs) or need building a new hardware such as an FPGA based design. On the other hard, software-based acceleration techniques are easier since they only require some changes in the software code or the software architecture. Typical software-based techniques make use of computers connected over a network, also known as a network grid to accelerate the processing. In this paper, we test the hypothesis that multi-core architectures should provide better performance in this kind of computation, but still it would depend on the algorithm selected as well as the programming model being utilized. We present the acceleration of a stringsearching algorithm on a multi-core CPU via a POSIX thread based implementation. Our implementation on an 8-core processor (that supports 16-threads) resulted in 9x throughput improvement compared to a single thread implementation.
Pot and field studies were undertaken to assess the substitutability of triple superphosphate (TSP) by a phosphorus (P) fertilizer mixture (PFM) comprising TSP, rock phosphate (RP), and P-solubilizing bacterial inoculants for wetland rice. Six single and two dual inoculants were formulated with Enterobactor gegovie and five Bacillus species and tested in pot and field experiments. Soil-available P and tissue P contents were analyzed, and yield data were recorded. In the pot experiment, the dual inoculant containing E. gegovie + B. mycoides and the single inoculant B. subtilis increased yields by 32% and 25%, respectively, over the TSP control. Under field conditions, E. gegovie + B. subtilis, and E. gegovie + B. pumilus increased grain yield by 22-27% over the TSP control (574 g m −2 ). Results revealed that 50% of TSP could be substituted with RP along with seed inoculants formulated with E. gegovie, B. pumilus, and B. subtilis under tested conditions.
BackgroundIn metagenomics, the separation of nucleotide sequences belonging to an individual or closely matched populations is termed binning. Binning helps the evaluation of underlying microbial population structure as well as the recovery of individual genomes from a sample of uncultivable microbial organisms. Both supervised and unsupervised learning methods have been employed in binning; however, characterizing a metagenomic sample containing multiple strains remains a significant challenge.In this study, we designed and implemented a new workflow, Coverage and composition based binning of Metagenomes (CoMet), for binning contigs in a single metagenomic sample. CoMet utilizes coverage values and the compositional features of metagenomic contigs. The binning strategy in CoMet includes the initial grouping of contigs in guanine-cytosine (GC) content-coverage space and refinement of bins in tetranucleotide frequencies space in a purely unsupervised manner. With CoMet, the clustering algorithm DBSCAN is employed for binning contigs. The performances of CoMet were compared against four existing approaches for binning a single metagenomic sample, including MaxBin, Metawatt, MyCC (default) and MyCC (coverage) using multiple datasets including a sample comprised of multiple strains.ResultsBinning methods based on both compositional features and coverages of contigs had higher performances than the method which is based only on compositional features of contigs. CoMet yielded higher or comparable precision in comparison to the existing binning methods on benchmark datasets of varying complexities. MyCC (coverage) had the highest ranking score in F1-score. However, the performances of CoMet were higher than MyCC (coverage) on the dataset containing multiple strains. Furthermore, CoMet recovered contigs of more species and was 18 - 39% higher in precision than the compared existing methods in discriminating species from the sample of multiple strains. CoMet resulted in higher precision than MyCC (default) and MyCC (coverage) on a real metagenome.ConclusionsThe approach proposed with CoMet for binning contigs, improves the precision of binning while characterizing more species in a single metagenomic sample and in a sample containing multiple strains. The F1-scores obtained from different binning strategies vary with different datasets; however, CoMet yields the highest F1-score with a sample comprised of multiple strains.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-017-1967-3) contains supplementary material, which is available to authorized users.
Background Nanopore sequencing allows selective sequencing, the ability to programmatically reject unwanted reads in a sample. Selective sequencing has many present and future applications in genomics research and the classification of species from a pool of species is an example. Existing methods for selective sequencing for species classification are still immature and the accuracy highly varies depending on the datasets. For the five datasets we tested, the accuracy of existing methods varied in the range of $$\sim$$ ∼ 77 to 97% (average accuracy < 89%). Here we present DeepSelectNet, an accurate deep-learning-based method that can directly classify nanopore current signals belonging to a particular species. DeepSelectNet utilizes novel data preprocessing techniques and improved neural network architecture for regularization. Results For the five datasets tested, DeepSelectNet’s accuracy varied between $$\sim$$ ∼ 91 and 99% (average accuracy $$\sim$$ ∼ 95%). At its best performance, DeepSelectNet achieved a nearly 12% accuracy increase compared to its deep learning-based predecessor SquiggleNet. Furthermore, precision and recall evaluated for DeepSelectNet on average were always > 89% (average $$\sim$$ ∼ 95%). In terms of execution performance, DeepSelectNet outperformed SquiggleNet by $$\sim$$ ∼ 13% on average. Thus, DeepSelectNet is a practically viable method to improve the effectiveness of selective sequencing. Conclusions Compared to base alignment and deep learning predecessors, DeepSelectNet can significantly improve the accuracy to enable real-time species classification using selective sequencing. The source code of DeepSelectNet is available at https://github.com/AnjanaSenanayake/DeepSelectNet.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.