Bioinformatics services for analyzing massive genomic datasets

Ko, Gunhwan; Kim, Pan-Gyu; Cho, Youngbum; Jeong, Seongmun; Kim, Jae-Yoon; Kim, Kyoung Hyoun; Lee, Hoyeon; Han, Jiyeon; Yu, Namhee; Ham, Seokjin; Jang, Insoon; Kang, Byung Ha; Shin, Sunguk; Kim, Lian; Lee, Seungwon; Nam, Dougu; Kim, Jihyun F.; Kim, Namshin; Kim, Seon‐Young; Lee, Sanghyuk; Roh, Tae-Young; Lee, Byung-Wook

doi:10.5808/gi.2020.18.1.e8

Cited by 7 publications

(2 citation statements)

References 39 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To tackle this predicament, researchers need solutions conducting genome wide calculations on expansive repositories in a fast, resource-efficient, and scalable nature (10,17,19). However, popular computational tools, though capable of being executed on HPC (High Performance Computing) clusters, have been designed with the goal of processing data that spans a few hundred individuals across thousands of polymorphic sites (20)(21)(22). They use single thread algorithms that do not exploit the prevalent parallel processing technologies made available through multi core CPUs (Control Processing Unit), and GPUs (Graphical Processing Unit) (10,21,23,24).…”

Section: Introductionmentioning

confidence: 99%

CATE: A fast and scalable CUDA implementation to conduct highly parallelized evolutionary tests on large scale genomic data

Perera

Reisenhofer

Hussein

et al. 2023

Preprint

View full text Add to dashboard Cite

Statistical tests for molecular evolution provide quantifiable insights into the selection pressures that govern a genome′s evolution. Increasing sample sizes used for analysis leads to higher statistical power. However, this requires more computational nodes or longer computational time. CATE (CUDA Accelerated Testing of Evolution) is a computational solution to this problem comprised of two main innovations. The first is a file organization system coupled with a novel search algorithm and the second is a large-scale parallelization of algorithms using both GPU and CPU. CATE is capable of conducting evolutionary tests such as Tajima′s D, Fu and Li′s, and Fay and Wu′s test statistics, McDonald–Kreitman Neutrality Index, Fixation Index, and Extended Haplotype Homozygosity. CATE is magnitudes faster than standard tools with benchmarks estimating it being on average over 180 times faster. For instance, CATE processes all 54,849 human genes for all 22 autosomal chromosomes across the five super populations present in the 1000 Genomes Project in less than thirty minutes while counterpart software took 3.62 days. This proven framework has the potential to be adapted for GPU-accelerated large-scale parallel analyses of many evolutionary and genomic analyses.

show abstract

Section: Introductionmentioning

confidence: 99%

CATE: A fast and scalable CUDA implementation to conduct highly parallelized evolutionary tests on large scale genomic data

Perera

Reisenhofer

Hussein

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…To tackle this predicament, researchers need solutions that are of a fast, resource‐efficient, and scalable nature (Pfeifer et al, 2014; Rozas et al, 2017; Szpiech & Hernandez, 2014). However, popular computational tools, though capable of being executed on HPC (high‐performance computing) clusters, have been designed with the goal of processing data that spans a few hundred individuals across thousands of polymorphic sites (Cook & Andersen, 2017; Ko et al, 2020; Siepel, 2019). They use single‐thread algorithms that do not exploit the prevalent parallel processing technologies made available through multi‐core CPUs (central processing unit), and GPUs (graphical processing unit) (Cook & Andersen, 2017; Ghorpade, 2012; Pfeifer et al, 2014; Tendler et al, 2002).…”

Section: Introductionmentioning

confidence: 99%

CATE: A fast and scalable CUDA implementation to conduct highly parallelized evolutionary tests on large scale genomic data

et al. 2023

View full text Add to dashboard Cite

Statistical tests for molecular evolution provide quantifiable insights into the selection pressures that govern a genome's evolution. Increasing sample sizes used for analysis leads to higher statistical power. However, this requires more computational nodes or longer computational time. CATE (CUDA Accelerated Testing of Evolution) is a computational solution to this problem comprised of two main innovations. The first is a file organization system coupled with a novel search algorithm and the second is a large‐scale parallelization of algorithms using both graphical processing unit (GPU) and central processing unit. CATE is capable of conducting evolutionary tests such as Tajima's D, Fu and Li's, and Fay and Wu's test statistics, McDonald–Kreitman Neutrality Index, Fixation Index and Extended Haplotype Homozygosity. CATE is magnitudes faster than standard tools with benchmarks estimating it being on average over 180 times faster. For instance, CATE processes all 54,849 human genes for all 22 autosomal chromosomes across the five super populations present in the 1000 Genomes Project in less than 30 min while counterpart software took 3.62 days. This proven framework has the potential to be adapted for GPU‐accelerated large‐scale parallel analyses of many evolutionary and genomic analyses.

show abstract

A comprehensive review and conceptual framework for cloud computing adoption in bioinformatics

Banimfreg

2023

Healthcare Analytics

View full text Add to dashboard Cite

Bioinformatics services for analyzing massive genomic datasets

Cited by 7 publications

References 39 publications

CATE: A fast and scalable CUDA implementation to conduct highly parallelized evolutionary tests on large scale genomic data

CATE: A fast and scalable CUDA implementation to conduct highly parallelized evolutionary tests on large scale genomic data

CATE: A fast and scalable CUDA implementation to conduct highly parallelized evolutionary tests on large scale genomic data

A comprehensive review and conceptual framework for cloud computing adoption in bioinformatics

Contact Info

Product

Resources

About