BackgroundGenotype imputation can help reduce genotyping costs particularly for implementation of genomic selection. In applications entailing large populations, recovering the genotypes of untyped loci using information from reference individuals that were genotyped with a higher density panel is computationally challenging. Popular imputation methods are based upon the Hidden Markov model and have computational constraints due to an intensive sampling process. A fast, deterministic approach, which makes use of both family and population information, is presented here. All individuals are related and, therefore, share haplotypes which may differ in length and frequency based on their relationships. The method starts with family imputation if pedigree information is available, and then exploits close relationships by searching for long haplotype matches in the reference group using overlapping sliding windows. The search continues as the window size is shrunk in each chromosome sweep in order to capture more distant relationships.ResultsThe proposed method gave higher or similar imputation accuracy than Beagle and Impute2 in cattle data sets when all available information was used. When close relatives of target individuals were present in the reference group, the method resulted in higher accuracy compared to the other two methods even when the pedigree was not used. Rare variants were also imputed with higher accuracy. Finally, computing requirements were considerably lower than those of Beagle and Impute2. The presented method took 28 minutes to impute from 6 k to 50 k genotypes for 2,000 individuals with a reference size of 64,429 individuals.ConclusionsThe proposed method efficiently makes use of information from close and distant relatives for accurate genotype imputation. In addition to its high imputation accuracy, the method is fast, owing to its deterministic nature and, therefore, it can easily be used in large data sets where the use of other methods is impractical.
Stature is affected by many polymorphisms of small effect in humans . In contrast, variation in dogs, even within breeds, has been suggested to be largely due to variants in a small number of genes. Here we use data from cattle to compare the genetic architecture of stature to those in humans and dogs. We conducted a meta-analysis for stature using 58,265 cattle from 17 populations with 25.4 million imputed whole-genome sequence variants. Results showed that the genetic architecture of stature in cattle is similar to that in humans, as the lead variants in 163 significantly associated genomic regions (P < 5 × 10) explained at most 13.8% of the phenotypic variance. Most of these variants were noncoding, including variants that were also expression quantitative trait loci (eQTLs) and in ChIP-seq peaks. There was significant overlap in loci for stature with humans and dogs, suggesting that a set of common genes regulates body size in mammals.
Executable versions of QMSim for Windows and Linux are freely available at http://www.aps.uoguelph.ca/~msargol/qmsim/.
The success of fine-scale mapping and genomic selection depends mainly on the strength of linkage disequilibrium (LD) between markers and causal mutations. With Lewontin's measure of LD (known as D'), high levels of LD that extend over several million base pairs have been reported in livestock. However, this measure of LD can be strongly biased upward by small samples and by low allele frequencies. The aim of this study was to characterize the level and extent of LD in Holstein cattle in North America (Canada and the United States for purposes of this study) by using the squared correlation of the alleles at 2 loci (r(2)). The Affymetrix MegAllele GeneChip Bovine Mapping 10K single nucleotide polymorphism (SNP) array was used to genotype 821 bulls, from which 497 were used in the analysis of the extent of LD. A total of 5,564 SNP were used after filtering out SNP with more than 5% of Mendelian inconsistencies, with more than 20% missing genotypes, or with a minor allele frequency of less than 10%. Analysis of syntenic pairs revealed that useful LD (measured as r(2) > 0.3) occurred at distances shorter than 100 kb. Linkage disequilibrium decayed very rapidly, within a few hundred kilobase pairs. In addition, no substantial LD between unlinked loci was found. Using a sliding window analysis, we observed an irregular pattern of LD across the genome. These findings suggest that to capture useful LD, which is required for whole-genome fine mapping and genomic selection, a denser SNP map would be needed.
Genomic evaluations for 161,341 Holsteins were computed by using 311,725 of 777,962 markers on the Illumina BovineHD Genotyping BeadChip (HD). Initial edits with 1,741 HD genotypes from 5 breeds revealed that 636,967 markers were usable but that half were redundant. Holstein genotypes were from 1,510 animals with HD markers, 82,358 animals with 45,187 (50K) markers, 1,797 animals with 8,031 (8K) markers, 20,177 animals with 6,836 (6K) markers, 52,270 animals with 2,683 (3K) markers, and 3,229 nongenotyped dams (0K) with >90% of haplotypes imputable because they had 4 or more genotyped progeny. The Holstein HD genotypes were from 1,142 US, Canadian, British, and Italian sires, 196 other sires, 138 cows in a US Department of Agriculture research herd (Beltsville, MD), and 34 other females. Percentages of correctly imputed genotypes were tested by applying the programs findhap and FImpute to a simulated chromosome for an earlier population that had only 1,112 animals with HD genotypes and none with 8K genotypes. For each chip, 1% of the genotypes were missing and 0.02% were incorrect initially. After imputation of missing markers with findhap, percentages of genotypes correct were 99.9% from HD, 99.0% from 50K, 94.6% from 6K, 90.5% from 3K, and 93.5% from 0K. With FImpute, 99.96% were correct from HD, 99.3% from 50K, 94.7% from 6K, 91.1% from 3K, and 95.1% from 0K genotypes. Accuracy for the 3K and 6K genotypes further improved by approximately 2 percentage points if imputed first to 50K and then to HD instead of imputing all genotypes directly to HD. Evaluations were tested by using imputed actual genotypes and August 2008 phenotypes to predict deregressed evaluations of US bulls proven after August 2008. For 28 traits tested, the estimated genomic reliability averaged 61.1% when using 311,725 markers vs. 60.7% when using 45,187 markers vs. 29.6% from the traditional parent average. Squared correlations with future data were slightly greater for 16 traits and slightly less for 12 with HD than with 50K evaluations. The observed 0.4 percentage point average increase in reliability was less favorable than the 0.9 expected from simulation but was similar to actual gains from other HD studies. The largest HD and 50K marker effects were often located at very similar positions. The single-breed evaluation tested here and previous single-breed or multibreed evaluations have not produced large gains. Increasing the number of HD genotypes used for imputation above 1,074 did not improve the reliability of Holstein genomic evaluations.
BackgroundWhile autozygosity as a consequence of selection is well understood, there is limited information on the ability of different methods to measure true inbreeding. In the present study, a gene dropping simulation was performed and inbreeding estimates based on runs of homozygosity (ROH), pedigree, and the genomic relationship matrix were compared to true inbreeding. Inbreeding based on ROH was estimated using SNP1101, PLINK, and BCFtools software with different threshold parameters. The effects of different selection methods on ROH patterns were also compared. Furthermore, inbreeding coefficients were estimated in a sample of genotyped North American Holstein animals born from 1990 to 2016 using 50 k chip data and ROH patterns were assessed before and after genomic selection.ResultsUsing ROH with a minimum window size of 20 to 50 using SNP1101 provided the closest estimates to true inbreeding in simulation study. Pedigree inbreeding tended to underestimate true inbreeding, and results for genomic inbreeding varied depending on assumptions about base allele frequencies. Using an ROH approach also made it possible to assess the effect of population structure and selection on distribution of runs of autozygosity across the genome. In the simulation, the longest individual ROH and the largest average length of ROH were observed when selection was based on best linear unbiased prediction (BLUP), whereas genomic selection showed the largest number of small ROH compared to BLUP estimated breeding values (BLUP-EBV). In North American Holsteins, the average number of ROH segments of 1 Mb or more per individual increased from 57 in 1990 to 82 in 2016. The rate of increase in the last 5 years was almost double that of previous 5 year periods. Genomic selection results in less autozygosity per generation, but more per year given the reduced generation interval.ConclusionsThis study shows that existing software based on the measurement of ROH can accurately identify autozygosity across the genome, provided appropriate threshold parameters are used. Our results show how different selection strategies affect the distribution of ROH, and how the distribution of ROH has changed in the North American dairy cattle population over the last 25 years.Electronic supplementary materialThe online version of this article (10.1186/s12864-018-4453-z) contains supplementary material, which is available to authorized users.
BackgroundEffectiveness of genomic selection and fine mapping is determined by the level of linkage disequilibrium (LD) across the genome. Knowledge of the range of genome-wide LD, defined as a non-random association of alleles at different loci, can provide an insight into the optimal density and location of single-nucleotide polymorphisms (SNPs) for genome-wide association studies and can be a keystone for interpretation of results from QTL mapping.ResultsLinkage disequilibrium was measured by |D'| and r2 between 38,590 SNPs (spaced across 29 bovine autosomes and the X chromosome) using genotypes of 887 Holstein bulls. The average level of |D'| and r2 for markers 40-60 kb apart was 0.72 and 0.20, respectively in Holstein cattle. However, a high degree of heterogeneity of LD was observed across the genome. The sample size and minor allele frequency had an effect on |D'| estimates, however, r2 was not noticeably affected by these two factors. Syntenic LD was shown to be useful for verifying the physical location of SNPs. No differences in the extent of LD and decline of LD with distance were found between the intragenic and intergenic regions.ConclusionsA minimal sample size of 444 and 55 animals is required for an accurate estimation of LD by |D'| and r2, respectively. The use of only maternally inherited haplotypes is recommended for analyses of LD in populations consisting of large paternal half-sib families. Large heterogeneity in the pattern and the extent of LD in Holstein cattle was observed on both autosomes and the X chromosome. The extent of LD was higher on the X chromosome compared to the autosomes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.