MotivationQuality control and preprocessing of FASTQ files are essential to providing clean data for downstream analysis. Traditionally, a different tool is used for each operation, such as quality control, adapter trimming and quality filtering. These tools are often insufficiently fast as most are developed using high-level programming languages (e.g. Python and Java) and provide limited multi-threading support. Reading and loading data multiple times also renders preprocessing slow and I/O inefficient.ResultsWe developed fastp as an ultra-fast FASTQ preprocessor with useful quality control and data-filtering features. It can perform quality control, adapter trimming, quality filtering, per-read quality pruning and many other operations with a single scan of the FASTQ data. This tool is developed in C++ and has multi-threading support. Based on our evaluation, fastp is 2–5 times faster than other FASTQ preprocessing tools such as Trimmomatic or Cutadapt despite performing far more operations than similar tools.Availability and implementationThe open-source code and corresponding instructions are available at https://github.com/OpenGene/fastp.
Motivation: Quality control (QC) and preprocessing of FASTQ files are necessary steps to provide clean data for downstream analysis. Traditionally, for each operation, such as QC, adapter trimming and quality filtering, a different tool is used. These tools are usually not fast enough since they are mostly developed in high-level programming languages like Python and Java, and provide limited multi-threading support. Also, the necessity to read and load data for multiple times makes the preprocessing slow and I/O inefficient. Results: We developed fastp as an ultra-fast FASTQ preprocessor with most useful QC and data filtering features. It can perform quality control, adapter trimming, quality filtering, per-read quality cutting and lots of other operations within a single scan of the FASTQ data. It also supports unique molecular identifier (UMI) preprocessing, poly tail trimming, output splitting, and base correction for paired-end data. It can automatically detect the adapters for both single-end and paired-end FASTQ data. This tool is developed in C++ and has multi-threading support. Based on our evaluation, fastp is 2~5 times faster than other FASTQ preprocessing tools like Trimmomatic or Cutadapt, in spite of that fastp performs much more operations than the latter ones.
Availability and Implementation:The open-source code and corresponding instructions are available at: https://github.com/OpenGene/fastp Contact: firstname.lastname@example.org
BackgroundSome applications, especially those clinical applications requiring high accuracy of sequencing data, usually have to face the troubles caused by unavoidable sequencing errors. Several tools have been proposed to profile the sequencing quality, but few of them can quantify or correct the sequencing errors. This unmet requirement motivated us to develop AfterQC, a tool with functions to profile sequencing errors and correct most of them, plus highly automated quality control and data filtering features. Different from most tools, AfterQC analyses the overlapping of paired sequences for pair-end sequencing data. Based on overlapping analysis, AfterQC can detect and cut adapters, and furthermore it gives a novel function to correct wrong bases in the overlapping regions. Another new feature is to detect and visualise sequencing bubbles, which can be commonly found on the flowcell lanes and may raise sequencing errors. Besides normal per cycle quality and base content plotting, AfterQC also provides features like polyX (a long sub-sequence of a same base X) filtering, automatic trimming and K-MER based strand bias profiling.ResultsFor each single or pair of FastQ files, AfterQC filters out bad reads, detects and eliminates sequencer’s bubble effects, trims reads at front and tail, detects the sequencing errors and corrects part of them, and finally outputs clean data and generates HTML reports with interactive figures. AfterQC can run in batch mode with multiprocess support, it can run with a single FastQ file, a single pair of FastQ files (for pair-end sequencing), or a folder for all included FastQ files to be processed automatically. Based on overlapping analysis, AfterQC can estimate the sequencing error rate and profile the error transform distribution. The results of our error profiling tests show that the error distribution is highly platform dependent.ConclusionMuch more than just another new quality control (QC) tool, AfterQC is able to perform quality control, data filtering, error profiling and base correction automatically. Experimental results show that AfterQC can help to eliminate the sequencing errors for pair-end sequencing data to provide much cleaner outputs, and consequently help to reduce the false-positive variants, especially for the low-frequency somatic mutations. While providing rich configurable options, AfterQC can detect and set all the options automatically and require no argument in most cases.
Novel Bi(2)S(3)/BiOI heterostructures were successfully synthesized through a facile and economical ion exchange method between BiOI and thioacetamide (CH(3)CSNH(2)), and characterized by multiform techniques, such as XRD, Raman, FT-IR, XPS, SEM, TEM, HRTEM, SAED, BET and DRS. The obtained Bi(2)S(3)/BiOI photocatalysts showed excellent photocatalytic performance for decomposing organic dye methyl orange (MO) compared with pure BiOI under visible light irradiation (λ > 420 nm). Among the Bi(2)S(3)/BiOI photocatalysts with different molar percentage of Bi(2)S(3) to initial BiOI (from 2 to 8%), 4% Bi(2)S(3)/BiOI exhibited the highest photocatalytic activity with apparent k(app) of 0.2968 h(-1). Differently, Bi(2)S(3)/BiOI displayed low photocatalytic activity for many colorless organic substrates, such as phenol, 2-chlorophenol, dimethyl phthalate and 5-sulfosalicylic acid. Moreover, the study on the mechanism suggested that the enhanced photocatalytic activity mainly resulted from the role of Bi(2)S(3)-BiOI heterojunctions formed in the Bi(2)S(3)/BiOI, which could lead to efficient separation of photoinduced carriers.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.