Summary Efficient processing of large-scale genomic datasets has recently become possible due to the application of ‘big data’ technologies in bioinformatics pipelines. We present SeQuiLa—a distributed, ANSI SQL-compliant solution for speedy querying and processing of genomic intervals that is available as an Apache Spark package. Proposed range join strategy is significantly (∼22×) faster than the default Apache Spark implementation and outperforms other state-of-the-art tools for genomic intervals processing. Availability and implementation The project is available at http://biodatageeks.org/sequila/. Supplementary information Supplementary data are available at Bioinformatics online.
Background There are over 25 tools dedicated for the detection of Copy Number Variants (CNVs) using Whole Exome Sequencing (WES) data based on read depth analysis. The tools reported consist of several steps, including: (i) calculation of read depth for each sequencing target, (ii) normalization, (iii) segmentation and (iv) actual CNV calling. The essential aspect of the entire process is the normalization stage, in which systematic errors and biases are removed and the reference sample set is used to increase the signal-to-noise ratio. Although some CNV calling tools use dedicated algorithms to obtain the optimal reference sample set, most of the advanced CNV callers do not include this feature. To our knowledge, this work is the first attempt to assess the impact of reference sample set selection on CNV detection performance. Methods We used WES data from the 1000 Genomes project to evaluate the impact of various methods of reference sample set selection on CNV calling performance of three chosen state-of-the-art tools: CODEX, CNVkit and exomeCopy. Two naive solutions (all samples as reference set and random selection) as well as two clustering methods (k-means and k nearest neighbours (kNN) with a variable number of clusters or group sizes) have been evaluated to discover the best performing sample selection method. Results and Conclusions The performed experiments have shown that the appropriate selection of the reference sample set may greatly improve the CNV detection rate. In particular, we found that smart reduction of reference sample size may significantly increase the algorithms’ precision while having negligible negative effect on sensitivity. We observed that a complete CNV calling process with the k-means algorithm as the selection method has significantly better time complexity than kNN-based solution. Electronic supplementary material The online version of this article (10.1186/s12859-019-2889-z) contains supplementary material, which is available to authorized users.
Background Depth of coverage calculation is an important and computationally intensive preprocessing step in a variety of next-generation sequencing pipelines, including the analysis of RNA-sequencing data, detection of copy number variants, or quality control procedures. Results Building upon big data technologies, we have developed SeQuiLa-cov, an extension to the recently released SeQuiLa platform, which provides efficient depth of coverage calculations, reaching >100× speedup over the state-of-the-art tools. The performance and scalability of our solution allow for exome and genome-wide calculations running locally or on a cluster while hiding the complexity of the distributed computing with Structured Query Language Application Programming Interface. Conclusions SeQuiLa-cov provides significant performance gain in depth of coverage calculations streamlining the widely used bioinformatic processing pipelines.
BackgroundDepth of coverage calculation is an important and computationally intensive preprocessing step in a variety of next generation sequencing pipelines, including the analyses of RNA-seq data, detection of copy number variants, or quality control procedures. Results Building upon big data technologies, we have developed SeQuiLa-cov, an extension to the recently released SeQuiLa platform, which provides e cient depth of coverage calculations, reaching more than 100x speedup over the state-of-the-art tools. Performance and scalability of our solution allows for exome and genome-wide calculations running locally or on a cluster while hiding the complexity of the distributed computing with Structured Query Language Application Programming Interface. Conclusions SeQuiLa-cov provides signi cant performance gain in depth of coverage calculations streamlining the widely used bioinformatic processing pipelines.• SeQuiLa-cov allows for high-coverage (∼60x) genome-wide depth of coverage calculations in less than one minute. • SeQuiLa-cov provides ANSI SQL compliant API for accessing and analyzing of aligned sequencing reads data.
Kierkuś (2021): Dietary management of infants and young children with feeding difficulties and unsatisfactory weight gain using a nutritionally complete hypercaloric infant formula. practical considerations from clinical cases, Postgraduate Medicine,
Background: There are over 25 tools dedicated for the detection of Copy Number Variants (CNVs) using Whole Exome Sequencing (WES) data based on read depth analysis.The tools reported consist of several steps, including: (i) calculation of read depth for each sequencing target, (ii) normalization, (iii) segmentation and (iv) actual CNV calling. The essential aspect of the entire process is the normalization stage, in which systematic errors and biases are removed and the reference sample set is used to increase the signal-to-noise ratio.Although some CNV calling tools use dedicated algorithms to obtain the optimal reference sample set, most of the advanced CNV callers do not include this feature.To our knowledge, this work is the first attempt to assess the impact of reference sample set selection on CNV detection performance. Methods:We used WES data from the 1000 Genomes project to evaluate the impact of various methods of reference sample set selection on CNV calling performance of three chosen state-of-the-art tools: CODEX, CNVkit and exomeCopy. Two naive solutions (all samples as reference set and random selection) as well as two clustering methods (k-means and k nearest neighbours with a variable number of clusters or group sizes) have been evaluated to discover the best performing sample selection method.
Background An increasing number of families with children who have spinal muscular atrophy (SMA) are incorporating a special amino acid diet into their child's feeding regimens. Characteristics of the diet include high‐carbohydrate and low‐fat content with added probiotics. However, because of insufficient evidenced‐based research, clinicians are unable to prescribe or endorse this diet. Our aim was to assess the tolerability of an adapted version of the traditional amino acid diet in children with SMA type I. Methods Children with SMA type I were recruited if they were enterally fed and experienced at least one gastrointestinal symptom (reflux, vomiting, constipation, and/or diarrhea). Children were transitioned to an amino acid formula (Neocate Syneo‐Nutricia) for 8 weeks. Feeding tolerance was measured weekly by telephone consultation to monitor reflux, vomiting, stool consistency, and frequency. Results Fourteen children were recruited, the mean age was 4.1 years (±1.2 SD), and 64% of participants were female. The mean resting energy expenditure determined by indirect calorimetry was 51.5 kcal/kg (±7 SD). The most common gastrointestinal complaint before switching to the amino acid formula was constipation, which was reported in 12 of 14 (85%) patients, of which 10 of the 12 (83%) children required daily stool softeners/laxatives to help regulate bowel function. After 8 weeks on the amino acid formula, 10 out of 12 (83%) children stopped or reduced constipation medication. Conclusion Children with SMA type I who display gastrointestinal symptoms such as constipation and reflux may benefit from an amino acid formula that is fortified with probiotics.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.