While genome sequencing and assembly are now routine, we do not have a full, precise picture of polyploid genomes. No existing polyploid phasing method provides accurate and contiguous haplotype predictions. We developed nPhase, a ploidy agnostic tool that leverages long reads and accurate short reads to solve alignment-based phasing for samples of unspecified ploidy (https://github.com/OmarOakheart/nPhase). nPhase is validated by tests on simulated and real polyploids. nPhase obtains on average over 95% accuracy and a contiguous 1.25 haplotigs per haplotype to cover more than 90% of each chromosome (heterozygosity rate ≥ 0.5%). nPhase allows population genomics and hybrid studies of polyploids.
The process of domestication has variable consequences on genome evolution leading to different phenotypic signatures. Access to the complete genome sequences of a large number of individuals makes it possible to explore the different facets of this domestication process. Here, we sought to explore the genome evolution of Kluyveromyces lactis, a yeast species well-known for its involvement in dairy processes but also present in natural environments. Using a combination of short and long-read sequencing strategies, we investigated the genomic variability of 41 K. lactis isolates and found that the overall genetic diversity of this species is very high (θw = 3.3 × 10−2) compared to other species such as Saccharomyces cerevisiae (θw = 1.6 × 10−2). However, the domesticated dairy population shows a reduced level of diversity (θw = 1 × 10−3), probably due to a domestication bottleneck. In addition, this entire population is characterized by the introgression of the LAC4 and LAC12 genes, responsible for lactose fermentation and coming from the closely related species, Kluyveromyces marxianus, as previously described. Our results also highlighted that the LAC4/LAC12 gene cluster was acquired through multiple and independent introgression events. Finally, we also identified several genes that could play a role in adaptation to dairy environments through copy number variation. These genes are involved in sugar consumption, flocculation and drug resistance, and may play a role in dairy processes. Overall, our study illustrates contrasting genomic evolution and sheds new light on the impact of domestication processes on it.
While genome sequencing and assembly are now routine, we still do not have a full and precise picture of polyploid genomes. Phasing these genomes, i.e. deducing haplotypes from genomic data, remains a challenge. Despite numerous attempts, no existing polyploid phasing method provides accurate and contiguous haplotype predictions. To address this need, we developed nPhase, a ploidy agnostic pipeline and algorithm that leverage the accuracy of short reads and the length of long reads to solve reference alignment-based phasing for samples of unspecified ploidy (https://github.com/nPhasePipeline/nPhase). nPhase was validated on virtually constructed polyploid genomes of the model species Saccharomyces cerevisiae, generated by combining sequencing data of homozygous isolates. nPhase obtained on average >95% accuracy and a contiguous 1.25 haplotigs per haplotype to cover >90% of each chromosome (heterozygosity rate ≥0.5%). This new phasing method opens the door to explore polyploid genomes through applications such as population genomics and hybrid studies.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.