BackgroundGenotyping-by-sequencing (GBS) is becoming an attractive alternative to array-based methods for genotyping individuals for a large number of single nucleotide polymorphisms (SNPs). Costs can be lowered by reducing the mean sequencing depth, but this results in genotype calls of lower quality. A common analysis strategy is to filter SNPs to just those with sufficient depth, thereby greatly reducing the number of SNPs available. We investigate methods for estimating relatedness using GBS data, including results of low depth, using theoretical calculation, simulation and application to a real data set.ResultsWe show that unbiased estimates of relatedness can be obtained by using only those SNPs with genotype calls in both individuals. The expected value of this estimator is independent of the SNP depth in each individual, under a model of genotype calling that includes the special case of the two alleles being read at random. In contrast, the estimator of self-relatedness does depend on the SNP depth, and we provide a modification to provide unbiased estimates of self-relatedness. We refer to these methods of estimation as kinship using GBS with depth adjustment (KGD). The estimators can be calculated using matrix methods, which allow efficient computation. Simulation results were consistent with the methods being unbiased, and suggest that the optimal sequencing depth is around 2–4 for relatedness between individuals and 5–10 for self-relatedness. Application to a real data set revealed that some SNP filtering may still be necessary, for the exclusion of SNPs which did not behave in a Mendelian fashion. A simple graphical method (a ‘fin plot’) is given to illustrate this issue and to guide filtering parameters.ConclusionWe provide a method which gives unbiased estimates of relatedness, based on SNPs assayed by GBS, which accounts for the depth (including zero depth) of the genotype calls. This allows GBS to be applied at read depths which can be chosen to optimise the information obtained. SNPs with excess heterozygosity, often due to (partial) polyploidy or other duplications can be filtered based on a simple graphical method.Electronic supplementary materialThe online version of this article (doi:10.1186/s12864-015-2252-3) contains supplementary material, which is available to authorized users.
This work investigated effects of carrying 0, 1, or 2 copies of the A allele resulting from the g+6723G-A transition in growth differentiation factor gene (GDF8) in New Zealand Texel-cross sheep at different lamb ages and carcass weights. Two Texel-cross sires carrying 1 copy of the A allele were mated to approximately 200 ewes carrying 0, 1, or 2 copies of the A allele. A total of 187 progeny were generated and genotyped to determine whether they were carrying 0, 1, or 2 copies of the A allele. The progeny were assigned to 1 of 4 slaughter groups balanced for the 3 genotypes, sex, and sire. The 4 groups were slaughtered commercially when their average BW (across all progeny in the slaughter group) reached 33, 40, 43, and 48 kg, respectively. Measurements of BW, and carcass dimensions and yield were made on all animals using Viascan (a commercial 2-dimensional imaging system that estimates lean content of the carcass as a percentage of total carcass weight). Additional measurements were made on the fourth slaughter group, which was computed tomography scanned at each slaughter time point to obtain 4 serial measures of lean and fat as estimated from the computed tomography images. The A allele did not have an effect on any BW traits. The A allele was associated with increased muscle and decreased fat across the variety of measures of muscling and fat, explaining between 0.2 and 1.1 of a residual SD unit. Estimates for an additive effect were significant and were positive for muscle and negative for fat traits. No dominance effect estimates (positive or negative) were significant. There was no significant interaction between A allele number and carcass weight or slaughter group for any trait. This is the first systematic study of the effect of the A allele copy number over a range of carcass weights (13 to 20 kg) and ages and results suggest the size of the effect across these endpoints is proportionately the same. Testing for the A allele therefore offers breeders the potential to improve rates of genetic gain for lean-meat yield across most production systems.
BackgroundThe recent development of next-generation sequencing DNA marker technologies, such as genotyping-by-sequencing (GBS), generates thousands of informative single nucleotide polymorphism markers in almost any species, regardless of genomic resources. This enables poorly resourced or “orphan” crops/species access to high-density, high-throughput marker platforms which have revolutionised population genetics studies and plant breeding. DNA quality underpins success of GBS methods as the DNA must be amenable to restriction enzyme digestion and sequencing. A barrier to implementing GBS technologies is access to inexpensive, high-throughput extraction methods that yield sequencing-quality genomic DNA (gDNA) from plants. Several high-throughput DNA extraction methods are available, but typically provide low yield or poor quality gDNA, or are costly (US$6–$9/sample) for consumables.ResultsWe modified a non-organic solvent protocol to extract microgram quantities (1–13 μg) of sequencing-quality high molecular weight gDNA inexpensively in 96-well plates from either fresh, freeze-dried or silica gel-dried plant tissue. The protocol was effective for several easy and difficult-to-extract forage, crop, horticultural and common model species including Trifolium, Medicago, Lolium, Secale, Festuca, Malus, Oryza, and Arabidopsis. The extracted DNA was of high molecular weight and digested readily with restriction enzymes. Contrasting with other extraction protocols we assessed, Illumina-based sequencing of GBS libraries developed from this gDNA had very uniform high quality base-calls to the end of sequence reads. Furthermore, DNA extracted using this method has been sequenced successfully with the PacBio long-read platform. The protocol is scalable, readily automated without requirement for fume hoods, requires approximately three hours to process 192 samples (384–576 samples/day), and is inexpensive at US$0.62/sample for consumables.ConclusionsThis versatile, scalable and simple protocol yields high molecular weight genomic DNA suitable for restriction enzyme digestion and next-generation sequencing applications including GBS and long-read sequencing platforms such as PacBio. The low cost, high-throughput, and extraction of high quality gDNA from a range of fresh and dried source plant material makes this method suitable for many sequencing and genotyping applications including large-scale sample screening underpinning breeding programmes.
High-throughput sequencing methods provide a cost-effective approach for genotyping and are commonly used in population genetics studies. A drawback of these methods, however, is that sequencing and genotyping errors can arise...
Microbial community profiles have been associated with a variety of traits, including methane emissions in livestock. These profiles can be difficult and expensive to obtain for thousands of samples (e.g. for accurate association of microbial profiles with traits), therefore the objective of this work was to develop a low-cost, high-throughput approach to capture the diversity of the rumen microbiome. Restriction enzyme reduced representation sequencing (RE-RRS) using ApeKI or PstI, and two bioinformatic pipelines (referencebased and reference-free) were compared to bacterial 16S rRNA gene sequencing using repeated samples collected two weeks apart from 118 sheep that were phenotypically extreme (60 high and 58 low) for methane emitted per kg dry matter intake (n = 236). DNA was extracted from freeze-dried rumen samples using a phenol chloroform and bead-beating protocol prior to RE-RRS. The resulting sequences were used to investigate the repeatability of the rumen microbial community profiles, the effect of laboratory and analytical method, and the relationship with methane production. The results suggested that the best method was PstI RE-RRS analyzed with the reference-free approach, which accounted for 53.3±5.9% of reads, and had repeatabilities of 0.49±0.07 and 0.50±0.07 for the first two principal components (PC1 and PC2), phenotypic correlations with methane yield of 0.43±0.06 and 0.46±0.06 for PC1 and PC2, and explained 41±8% of the variation in methane yield. These results were significantly better than for bacterial 16S rRNA gene sequencing of the same samples (p<0.05) except for the correlation between PC2 and methane yield. A Sensitivity study suggested approximately 2000 samples could be sequenced in a single lane on an Illumina HiSeq 2500, meaning the current work using 118 samples/lane and future proposed 384 samples/lane are well within that threshold. With minor adaptations, our approach could be used to obtain microbial profiles from other metagenomic samples.
High-throughput sequencing methods that multiplex a large number of individuals have provided a cost-effective approach for discovering genome-wide genetic variation in large populations. These sequencing methods are increasingly being utilized in population genetic studies across a diverse range of species. One side-effect of these methods, however, is that one or more alleles at a particular locus may not be sequenced, particularly when the sequencing depth is low, resulting in some heterozygous genotypes being called as homozygous. Under-called heterozygous genotypes have a profound effect on the estimation of linkage disequilibrium and, if not taken into account, leads to inaccurate estimates.We developed a new likelihood method, GUS-LD, to estimate pairwise linkage disequilibrium using low coverage sequencing data that accounts for under-called heterozygous genotypes. Our findings show that accurate estimates were obtained using GUS-LD on low coverage sequencing data, whereas underestimation of linkage disequilibrium results if no adjustment is made for under-called heterozygotes.
Background Producing animal protein while reducing the animal’s impact on the environment, e.g., through improved feed efficiency and lowered methane emissions, has gained interest in recent years. Genetic selection is one possible path to reduce the environmental impact of livestock production, but these traits are difficult and expensive to measure on many animals. The rumen microbiome may serve as a proxy for these traits due to its role in feed digestion. Restriction enzyme-reduced representation sequencing (RE-RRS) is a high-throughput and cost-effective approach to rumen metagenome profiling, but the systematic (e.g., sequencing) and biological factors influencing the resulting reference based (RB) and reference free (RF) profiles need to be explored before widespread industry adoption is possible. Results Metagenome profiles were generated by RE-RRS of 4,479 rumen samples collected from 1,708 sheep, and assigned to eight groups based on diet, age, time off feed, and country (New Zealand or Australia) at the time of sample collection. Systematic effects were found to have minimal influence on metagenome profiles. Diet was a major driver of differences between samples, followed by time off feed, then age of the sheep. The RF approach resulted in more reads being assigned per sample and afforded greater resolution when distinguishing between groups than the RB approach. Normalizing relative abundances within the sampling Cohort abolished structures related to age, diet, and time off feed, allowing a clear signal based on methane emissions to be elucidated. Genus-level abundances of rumen microbes showed low-to-moderate heritability and repeatability and were consistent between diets. Conclusions Variation in rumen metagenomic profiles was influenced by diet, age, time off feed and genetics. Not accounting for environmental factors may limit the ability to associate the profile with traits of interest. However, these differences can be accounted for by adjusting for Cohort effects, revealing robust biological signals. The abundances of some genera were consistently heritable and repeatable across different environments, suggesting that metagenomic profiles could be used to predict an individual’s future performance, or performance of its offspring, in a range of environments. These results highlight the potential of using rumen metagenomic profiles for selection purposes in a practical, agricultural setting.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.