Runyang Nicolas Lou scite author profile

Despite massive reductions in the cost of DNA sequencing over the past decades, researchers remain faced with decisions about how to distribute sequencing effort along three dimensions: (a) how much of the genome to sequence (breadth of coverage), (b) how deeply to sequence each sample (depth of coverage), and (c) how many samples to sequence. Until recently, reduced-representation sequencing (e.g., RAD-seq [restriction site-associated DNA sequencing]), through which a small random portion of the genome can be sequenced deeply in many individuals to allow for simultaneous variant discovery and high-confidence genotyping, has been the most popular approach for population genomics of nonmodel organisms

show abstract

Novel signals of adaptive genetic variation in northwestern Atlantic cod revealed by whole‐genome sequencing

Clucas

Lou

Therkildsen

et al. 2019

Evolutionary Applications

View full text Add to dashboard Cite

Selection can create complex patterns of adaptive differentiation among populations in the wild that may be relevant to management. Atlantic cod in the Northwest Atlantic are at a fraction of their historical abundance and a lack of recovery within the Gulf of Maine has created concern regarding the misalignment of fisheries management structures with biological population structure. To address this and investigate genome‐wide patterns of variation, we used low‐coverage sequencing to perform a region‐wide, whole‐genome analysis of fine‐scale population structure. We sequenced 306 individuals from 20 sampling locations in U.S. and Canadian waters, including the major spawning aggregations in the Gulf of Maine in addition to spawning aggregations from Georges Bank, southern New England, the eastern Scotian Shelf, and St. Pierre Bank. With genotype likelihoods estimated at almost 11 million loci, we found large differences in haplotype frequencies of previously described chromosomal inversions between Canadian and U.S. sampling locations and also among U.S. sampling locations. Our whole‐genome resolution also revealed novel outlier peaks, some of which showed significant genetic differentiation among sampling locations. Comparisons between allochronic winter‐ and spring‐spawning populations revealed highly elevated relative (FST) and absolute (dxy) genetic differentiation near genes involved in reproduction, particularly genes associated with the brain‐pituitary‐gonadal axis, which likely control timing of spawning, contributing to prezygotic isolation. We also found genetic differentiation associated with heat shock proteins and other genes of functional relevance, with complex patterns that may point to multifaceted selection pressures and local adaptation among spawning populations. We provide a high‐resolution picture of U.S. Atlantic cod population structure, revealing greater complexity than is currently recognized in management. Our genome‐scan approach likely underestimates the full suite of adaptive differentiation among sampling locations. Nevertheless, it should inform the revision of stock boundaries to preserve adaptive genetic diversity and evolutionary potential of cod populations.

show abstract

A beginner's guide to low-coverage whole genome sequencing for population genomics

Lou

Jacobs

Wilder

et al. 2020

Preprint

View full text Add to dashboard Cite

Low-coverage whole genome sequencing (lcWGS) has emerged as a powerful and cost-effective approach for population genomic studies in both model and non-model species. However, with read depths too low to confidently call individual genotypes, lcWGS requires specialized analysis tools that explicitly account for genotype uncertainty. A growing number of such tools have become available, but it can be difficult to get an overview of what types of analyses can be performed reliably with lcWGS data and how the distribution of sequencing effort between the number of samples analyzed and per-sample sequencing depths affects inference accuracy. In this introductory guide to lcWGS, we first illustrate that the per-sample cost for lcWGS is now comparable to RAD-seq and Pool-seq in many systems. We then provide an overview of software packages that explicitly account for genotype uncertainty in different types of population genomic inference. Next, we use both simulated and empirical data to assess the accuracy of allele frequency estimation, detection of population structure, and selection scans under different sequencing strategies. Our results show that spreading a given amount of sequencing effort across more samples with lower depth per sample consistently improves the accuracy of most types of inference compared to sequencing fewer samples each at higher depth. Finally, we assess the potential for using imputation to bolster inference from lcWGS data in non-model species, and discuss current limitations and future perspectives for lcWGS-based analysis. With this overview, we hope to make lcWGS more approachable and stimulate broader adoption.

show abstract

Batch effects in population genomic studies with low‐coverage whole genome sequencing data: Causes, detection and mitigation

Lou

Therkildsen

2021

Molecular Ecology Resources

View full text Add to dashboard Cite

show abstract

A beginner's guide to low-coverage whole genome sequencing for population genomics

Lou

Jacobs

Wilder

et al. 2020

Preprint

View full text Add to dashboard Cite

show abstract

Full mitochondrial genome sequences reveal new insights about post-glacial expansion and regional phylogeographic structure in the Atlantic silverside (Menidia menidia)

et al. 2018

View full text Add to dashboard Cite

Batch effects in population genomic studies with low-coverage whole genome sequencing data: causes, detection, and mitigation

Lou

Therkildsen

2021

Preprint

View full text Add to dashboard Cite

Over the past few decades, the rapid democratization of high-throughput sequencing and the growing emphasis on open science practices have resulted in an explosion in the amount of publicly available sequencing data. This opens new opportunities for combining datasets to achieve unprecedented sample sizes, spatial coverage, or temporal replication in population genomic studies. However, a common concern is that non-biological differences between datasets may generate batch effects that can confound real biological patterns. Despite general awareness about the risk of batch effects, few studies have examined empirically how they manifest in real datasets, and it remains unclear what factors cause batch effects and how to best detect and mitigate their impact bioinformatically. In this paper, we compare two batches of low-coverage whole genome sequencing (lcWGS) data generated from the same populations of Atlantic cod (Gadus morhua). First, we show that with a “batch-effect-naive” bioinformatic pipeline, batch effects severely biased our genetic diversity estimates, population structure inference, and selection scan. We then demonstrate that these batch effects resulted from multiple technical differences between our datasets, including the sequencing instrument model/chemistry, read type, read length, DNA degradation level, and sequencing depth, but their impact can be detected and substantially mitigated with simple bioinformatic approaches. We conclude that combining datasets remains a powerful approach as long as batch effects are explicitly accounted for. We focus on lcWGS data in this paper, which may be particularly vulnerable to certain causes of batch effects, but many of our conclusions also apply to other sequencing strategies.

show abstract

A beginner's guide to low-coverage whole genome sequencing for population genomics

Lou

Jacobs

Wilder

et al. 2021

Preprint

View full text Add to dashboard Cite

Low-coverage whole genome sequencing (lcWGS) has emerged as a powerful and cost-effective approach for population genomic studies in both model and non-model species. However, with read depths too low to confidently call individual genotypes, lcWGS requires specialized analysis tools that explicitly account for genotype uncertainty. A growing number of such tools have become available, but it can be difficult to get an overview of what types of analyses can be performed reliably with lcWGS data, and how the distribution of sequencing effort between the number of samples analyzed and per-sample sequencing depths affects inference accuracy. In this introductory guide to lcWGS, we first illustrate how the per-sample cost for lcWGS is now comparable to RAD-seq and Pool-seq in many systems. We then provide an overview of software packages that explicitly account for genotype uncertainty in different types of population genomic inference. Next, we use both simulated and empirical data to assess the accuracy of allele frequency and genetic diversity estimation, detection of population structure, and selection scans under different sequencing strategies. Our results show that spreading a given amount of sequencing effort across more samples with lower depth per sample consistently improves the accuracy of most types of inference, with a few notable exceptions. Finally, we assess the potential for using imputation to bolster inference from lcWGS data in non-model species, and discuss current limitations and future perspectives for lcWGS-based population genomics research. With this overview, we hope to make lcWGS more approachable and stimulate its broader adoption.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.