2020
DOI: 10.1038/s10038-020-00862-1
|View full text |Cite
|
Sign up to set email alerts
|

Practical guide for managing large-scale human genome data in research

Abstract: Studies in human genetics deal with a plethora of human genome sequencing data that are generated from specimens as well as available on public domains. With the development of various bioinformatics applications, maintaining the productivity of research, managing human genome data, and analyzing downstream data is essential. This review aims to guide struggling researchers to process and analyze these large-scale genomic data to extract relevant information for improved downstream analyses. Here, we discuss w… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
35
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 43 publications
(39 citation statements)
references
References 65 publications
0
35
0
Order By: Relevance
“…More research groups are now making the move to cloud-based storage for their NGS data and minimising the amount of data stored has a direct impact on cost [ 94 ]. It is important to note that sample processing and analyses are available via cloud-based solutions also, and may be an attractive option for research groups lacking the necessary in-house infrastructure to process NGS data [ 95 ].…”
Section: Expanding Ird Diagnosis Via Whole-gene or Wgsmentioning
confidence: 99%
“…More research groups are now making the move to cloud-based storage for their NGS data and minimising the amount of data stored has a direct impact on cost [ 94 ]. It is important to note that sample processing and analyses are available via cloud-based solutions also, and may be an attractive option for research groups lacking the necessary in-house infrastructure to process NGS data [ 95 ].…”
Section: Expanding Ird Diagnosis Via Whole-gene or Wgsmentioning
confidence: 99%
“…The second challenge facing simulation methods is that sample sizes in genetic studies have grown very quickly in recent years, enabled by the precipitous fall in genome sequencing costs. Human datasets like the UK Biobank ( Bycroft et al 2018 ) and gnomAD ( Karczewski et al 2020 ) now consist of hundreds of thousands of genomes and many other datasets on a similar scale are becoming available ( Tanjo et al 2021 ). Classical simulators such as and even fast approximate methods such as ( Staab et al 2015 ) simply cannot cope with such a large number of samples.…”
Section: Introductionmentioning
confidence: 99%
“…Therefore, standardised pipelines need to be developed to allow reproducibility between centers. Awareness is increasing regarding this aspect, and efforts are being made in this direction [106].…”
Section: Rna-seq In Clinical Trials In Onco-immunologymentioning
confidence: 99%