Animal breed identification has wide and important application prospects in the field of genetic breeding. It not only provides effective genetic information for the selection and breeding of superior animals (Behl et al., 2006), but also provides new methods for the traceability of animal products (Dalvit et al., 2007). Meanwhile, it plays a vital role in biological science research (Yaro et al., 2017), pedigree identification (Dreger et al., 2016) and breed resource conservation (Weigend et al., 2004).The earliest breed identification was mainly carried out in a morphological manner (Ceccobelli et al., 2016).
Background Compared to medium-density single nucleotide polymorphism (SNP) data, high-density SNP data contain abundant genetic variants and provide more information for the genetic evaluation of livestock, but it has been shown that they do not confer any advantage for genomic prediction and heritability estimation. One possible reason is the uneven distribution of the linkage disequilibrium (LD) along the genome, i.e., LD heterogeneity among regions. The aim of this study was to effectively use genome-wide SNP data for genomic prediction and heritability estimation by using models that control LD heterogeneity among regions. Methods The LD-adjusted kinship (LDAK) and LD-stratified multicomponent (LDS) models were used to control LD heterogeneity among regions and were compared with the classical model that has no such control. Simulated and real traits of 2000 dairy cattle individuals with imputed high-density (770K) SNP data were used. Five types of phenotypes were simulated, which were controlled by very strongly, strongly, moderately, weakly and very weakly tagged causal variants, respectively. The performances of the models with high- and medium-density (50K) panels were compared to verify that the models that controlled LD heterogeneity among regions were more effective with high-density data. Results Compared to the medium-density panel, the use of the high-density panel did not improve and even decreased prediction accuracies and heritability estimates from the classical model for both simulated and real traits. Compared to the classical model, LDS effectively improved the accuracy of genomic predictions and unbiasedness of heritability estimates, regardless of the genetic architecture of the trait. LDAK applies only to traits that are mainly controlled by weakly tagged causal variants, but is still less effective than LDS for this type of trait. Compared with the classical model, LDS improved prediction accuracy by about 13% for simulated phenotypes and by 0.3 to ~ 10.7% for real traits with the high-density panel, and by ~ 1% for simulated phenotypes and by − 0.1 to ~ 6.9% for real traits with the medium-density panel. Conclusions Grouping SNPs based on regional LD to construct the LD-stratified multicomponent model can effectively eliminate the adverse effects of LD heterogeneity among regions, and greatly improve the efficiency of high-density SNP data for genomic prediction and heritability estimation.
The Farm animal Genotype-Tissue Expression (FarmGTEx, https://www.farmgtex.org/) project has been established to develop a comprehensive public resource of genetic regulatory variants in domestic animal species, which is essential for linking genetic polymorphisms to variation in phenotypes, helping fundamental biology discovery and exploitation in animal breeding and human biomedicine. Here we present results from the pilot phase of PigGTEx (http://piggtex.farmgtex.org/), where we processed 9,530 RNA-sequencing and 1,602 whole-genome sequencing samples from pigs. We build a pig genotype imputation panel, characterize the transcriptional landscape across over 100 tissues, and associate millions of genetic variants with five types of transcriptomic phenotypes in 34 tissues. We study interactions between genotype and breed/cell type, evaluate tissue specificity of regulatory effects, and elucidate the molecular mechanisms of their action using multi-omics data. Leveraging this resource, we decipher regulatory mechanisms underlying about 80% of the genetic associations for 207 pig complex phenotypes, and demonstrate the similarity of pigs to humans in gene expression and the genetic regulation behind complex phenotypes, corroborating the importance of pigs as a human biomedical model.
The size of reference population is an important factor affecting genomic prediction. Thus, combining different populations in genomic prediction is an attractive way to improve prediction ability. However, combining multireference population roughly cannot increase the prediction accuracy as well as expected in pig. This may be due to different linkage disequilibrium (LD) pattern differences between population. In this study, we used the imputed whole-genome sequencing (WGS) data to construct LD-based haplotypes for genomic prediction in combined population to explore the impact of different single-nucleotide polymorphism (SNP) densities, variant representation (SNPs or haplotype alleles), and reference population size on the prediction accuracy for reproduction traits. Our results showed that genomic best linear unbiased prediction (GBLUP) using the WGS data can improve prediction accuracy in multi-population but not within-population. Not only the genomic prediction accuracy of the haplotype method using 80 K chip data in multi-population but also GBLUP for the multi-population (3.4–5.9%) was higher than that within-population (1.2–4.3%). More importantly, we have found that using the haplotype method based on the WGS data in multi-population has better genomic prediction performance, and our results showed that building haploblock in this scenario based on low LD threshold (r2 = 0.2–0.3) produced an optimal set of variables for reproduction traits in Yorkshire pig population. Our results suggested that whether the use of the haplotype method based on the chip data or GBLUP (individual SNP method) based on the WGS data were beneficial for genomic prediction in multi-population, while simultaneously combining the haplotype method and WGS data was a better strategy for multi-population genomic evaluation.
Heritability enrichment analysis is an important means of exploring the genetic architecture of complex traits in human genetics. Heritability enrichment is typically defined as the proportion of an SNP subset explained heritability, divided by the proportion of SNPs. Heritability enrichment enables better study of underlying complex traits, such as functional variant/gene subsets, biological networks and metabolic pathways detected through integrating explosively increased omics data. This would be beneficial for genomic prediction of disease risk in humans and genetic values estimation of important economical traits in livestock and plant species. However, in livestock, factors affecting the heritability enrichment estimation of complex traits have not been examined. Previous studies on humans reported that the frequencies, effect sizes, and levels of linkage disequilibrium (LD) of underlying causal variants (CVs) would affect the heritability enrichment estimation. Therefore, the distribution of heritability across the genome should be fully considered to obtain the unbiased estimation of heritability enrichment. To explore the performance of different heritability enrichment models in livestock populations, we used the VanRaden, GCTA and α models, assuming different α values, and the LDAK model, considering LD weight. We simulated three types of phenotypes, with CVs from various minor allele frequency (MAF) ranges: genome-wide (0.005 ≤ MAF ≤ 0.5), common (0.05 ≤ MAF ≤ 0.5), and uncommon (0.01 ≤ MAF < 0.05). The performances of the models with two different subsets (one of which contained known CVs and the other consisting of randomly selected markers) were compared to verify the accuracy of heritability enrichment estimation of functional variant sets. Our results showed that models with known CV subsets provided more robust enrichment estimation. Models with different α values tended to provide stable and accurate estimates for common and genome-wide CVs (relative deviation 0.5–2.2%), while tending to underestimate the enrichment of uncommon CVs. As the α value increased, enrichments from 15.73% higher than true value (i.e., 3.00) to 48.93% lower than true value for uncommon CVs were observed. In addition, the long-range LD windows (e.g., 5000 kb) led to large bias of the enrichment estimations for both common and uncommon CVs. Overall, heritability enrichment estimations were sensitive for the α value assumption and LD weight consideration of different models. Accuracy would be greatly improved by using a suitable model. This study would be helpful in understanding the genetic architecture of complex traits and provides a reference for genetic analysis in the livestock population.
The domestic pig (Sus scrofa) and subfamilies have a long-term and extensive gene flow, especially in Southeast Asia. Demographically, as a gateway of southern China, Yunnan province with unique geographical location and complex climate system, but genomic, genetic introgression of Yunnan indigenous pigs is insufficient. Here, we analyzed population structure, differentiation, gene flow, adaptive introgression, signature selection and gene function of Yunnan indigenous pigs, European commercial and other Southeast Asia pigs using a pig genomics reference panel (PGRP v1) from pig Genotype-Tissue Expression project (PigGTEx). In this study, we clarified that the Diannan small-ear pig owned particular genetic information in the whole genome; we provided evidence of the introgression events from the Vietnam pig to the Diannan small-ear pig. We also outlined at least two conceptual routes of gene flow to Diannan small-ear in Southeast Asia. Three introgressed loci with similar chromosome positions and strong signature selection harbored the NAF1, NPY1R and NPY5R genes, which related to fat mass, immunity, and litter weight of pig complex trait using multiple bio-functionalization databases. Conclusively, these results laid the foundation for understanding introgression from Southeast Asia pigs to Yunnan indigenous pigs and provided a new insight into explaining the biological function of genes through multiple databases.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.