2020
DOI: 10.1111/1755-0998.13309
|View full text |Cite
|
Sign up to set email alerts
|

Sequencing platform shifts provide opportunities but pose challenges for combining genomic data sets

Abstract: Technological advances in DNA sequencing over the last decade now permit the production and curation of large genomic data sets in an increasing number of nonmodel species. Additionally, these new data provide the opportunity for combining data sets, resulting in larger studies with a broader taxonomic range. Whilst the development of new sequencing platforms has been beneficial, resulting in a higher throughput of data at a lower per‐base cost, shifts in sequencing technology can also pose challenges for thos… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

1
25
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
2

Relationship

2
5

Authors

Journals

citations
Cited by 19 publications
(26 citation statements)
references
References 39 publications
1
25
0
Order By: Relevance
“…Accordingly, the absence of a signal can result from a true G base in the DNA template, but any low‐intensity fluorescence signal (regardless of the true base) may also lead to a G call, which becomes problematic. Since the intensity of the fluorescence signal tends to decrease with sequencing cycles, false calls of G tend to be enriched at the end of reads, forming poly‐G tails (De‐Kayne et al, 2021). Although one might expect that reads with poly‐G tails would fail to map to the reference genome and therefore would not cause problems downstream (especially with global alignment settings), we found that many of these reads can in fact map to the reference genome with high confidence (i.e., with mapping quality scores >20, see Figure S2, also see Arora et al, 2019).…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…Accordingly, the absence of a signal can result from a true G base in the DNA template, but any low‐intensity fluorescence signal (regardless of the true base) may also lead to a G call, which becomes problematic. Since the intensity of the fluorescence signal tends to decrease with sequencing cycles, false calls of G tend to be enriched at the end of reads, forming poly‐G tails (De‐Kayne et al, 2021). Although one might expect that reads with poly‐G tails would fail to map to the reference genome and therefore would not cause problems downstream (especially with global alignment settings), we found that many of these reads can in fact map to the reference genome with high confidence (i.e., with mapping quality scores >20, see Figure S2, also see Arora et al, 2019).…”
Section: Resultsmentioning
confidence: 99%
“…However, neither of these options may be available when we combine pre-existing data sets (De-Kayne et al, 2021). When we supplement pre-existing data sets with new data, full randomization of samples is also not possible since we do not have control over which samples are included in pre-existing data sets.…”
mentioning
confidence: 99%
“…CC-BY 4.0 International license perpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for this this version posted February 20, 2022. ; https://doi.org/10.1101/2022.02.18.481029 doi: bioRxiv preprint calls and introduce erroneous signals of genetic differentiation; as outlined in 72 ) we mapped these two samples as above (resulting in a mean genome-wide coverage of 9.32x and 16.58x) and called genotypes again for all samples (including the two additional C. macrophthalmus individuals) at each of the original 15,841,979 SNP positions. Following this genotype calling, which resulted in 15,521,925 SNPs, SNP filtering was repeated as before, leaving 14,313,952 SNPs with no missing data across the dataset of 99 individuals.…”
Section: Genotyping and Loci Filteringmentioning
confidence: 99%
“…macrophthalmus from Lake Constance from one individual to three, we added sequencing data from an additional two individuals (Supplementary File S1). To avoid the downstream impacts of combining sequencing data from different runs (which can result from different biased nucleotide calls and introduce erroneous signals of genetic differentiation; as outlined in 72 ) we mapped these two samples as above (resulting in a mean genome-wide coverage of 9.32x and 16.58x) and called genotypes again for all samples (including the two additional C. macrophthalmus individuals) at each of the original 15,841,979 SNP positions. Following this genotype calling, which resulted in 15,521,925 SNPs, SNP filtering was repeated as before, leaving 14,313,952 SNPs with no missing data across the dataset of 99 individuals.…”
Section: Genotyping and Loci Filteringmentioning
confidence: 99%
“…Molecular Ecology Resources continues to publish comments, editorials, opinions, and technical review articles that provide guidance to authors on highly relevant topics. Specifically, articles included achieving high quality reference genome assemblies (Whibley et al, 2021) and best practices for downstream sequencing applications such as combining data across sequencing platforms (De‐Kayne et al, 2021) and assessing reference genomes prior to further analyses (Jauhal and Newcomb, 2021). There were also multiple articles providing guidance for biodiversity monitoring with eDNA (Bensch et al, 2021; Jurburg et al, 2021; Rodríguez‐Ezpeleta et al, 2021).…”
Section: Top Content Published In Molecular Ecology Resourcesmentioning
confidence: 99%