BackgroundGenotyping-by-sequencing (GBS) is becoming an attractive alternative to array-based methods for genotyping individuals for a large number of single nucleotide polymorphisms (SNPs). Costs can be lowered by reducing the mean sequencing depth, but this results in genotype calls of lower quality. A common analysis strategy is to filter SNPs to just those with sufficient depth, thereby greatly reducing the number of SNPs available. We investigate methods for estimating relatedness using GBS data, including results of low depth, using theoretical calculation, simulation and application to a real data set.ResultsWe show that unbiased estimates of relatedness can be obtained by using only those SNPs with genotype calls in both individuals. The expected value of this estimator is independent of the SNP depth in each individual, under a model of genotype calling that includes the special case of the two alleles being read at random. In contrast, the estimator of self-relatedness does depend on the SNP depth, and we provide a modification to provide unbiased estimates of self-relatedness. We refer to these methods of estimation as kinship using GBS with depth adjustment (KGD). The estimators can be calculated using matrix methods, which allow efficient computation. Simulation results were consistent with the methods being unbiased, and suggest that the optimal sequencing depth is around 2–4 for relatedness between individuals and 5–10 for self-relatedness. Application to a real data set revealed that some SNP filtering may still be necessary, for the exclusion of SNPs which did not behave in a Mendelian fashion. A simple graphical method (a ‘fin plot’) is given to illustrate this issue and to guide filtering parameters.ConclusionWe provide a method which gives unbiased estimates of relatedness, based on SNPs assayed by GBS, which accounts for the depth (including zero depth) of the genotype calls. This allows GBS to be applied at read depths which can be chosen to optimise the information obtained. SNPs with excess heterozygosity, often due to (partial) polyploidy or other duplications can be filtered based on a simple graphical method.Electronic supplementary materialThe online version of this article (doi:10.1186/s12864-015-2252-3) contains supplementary material, which is available to authorized users.
This work investigated effects of carrying 0, 1, or 2 copies of the A allele resulting from the g+6723G-A transition in growth differentiation factor gene (GDF8) in New Zealand Texel-cross sheep at different lamb ages and carcass weights. Two Texel-cross sires carrying 1 copy of the A allele were mated to approximately 200 ewes carrying 0, 1, or 2 copies of the A allele. A total of 187 progeny were generated and genotyped to determine whether they were carrying 0, 1, or 2 copies of the A allele. The progeny were assigned to 1 of 4 slaughter groups balanced for the 3 genotypes, sex, and sire. The 4 groups were slaughtered commercially when their average BW (across all progeny in the slaughter group) reached 33, 40, 43, and 48 kg, respectively. Measurements of BW, and carcass dimensions and yield were made on all animals using Viascan (a commercial 2-dimensional imaging system that estimates lean content of the carcass as a percentage of total carcass weight). Additional measurements were made on the fourth slaughter group, which was computed tomography scanned at each slaughter time point to obtain 4 serial measures of lean and fat as estimated from the computed tomography images. The A allele did not have an effect on any BW traits. The A allele was associated with increased muscle and decreased fat across the variety of measures of muscling and fat, explaining between 0.2 and 1.1 of a residual SD unit. Estimates for an additive effect were significant and were positive for muscle and negative for fat traits. No dominance effect estimates (positive or negative) were significant. There was no significant interaction between A allele number and carcass weight or slaughter group for any trait. This is the first systematic study of the effect of the A allele copy number over a range of carcass weights (13 to 20 kg) and ages and results suggest the size of the effect across these endpoints is proportionately the same. Testing for the A allele therefore offers breeders the potential to improve rates of genetic gain for lean-meat yield across most production systems.
Microbial community profiles have been associated with a variety of traits, including methane emissions in livestock. These profiles can be difficult and expensive to obtain for thousands of samples (e.g. for accurate association of microbial profiles with traits), therefore the objective of this work was to develop a low-cost, high-throughput approach to capture the diversity of the rumen microbiome. Restriction enzyme reduced representation sequencing (RE-RRS) using ApeKI or PstI, and two bioinformatic pipelines (referencebased and reference-free) were compared to bacterial 16S rRNA gene sequencing using repeated samples collected two weeks apart from 118 sheep that were phenotypically extreme (60 high and 58 low) for methane emitted per kg dry matter intake (n = 236). DNA was extracted from freeze-dried rumen samples using a phenol chloroform and bead-beating protocol prior to RE-RRS. The resulting sequences were used to investigate the repeatability of the rumen microbial community profiles, the effect of laboratory and analytical method, and the relationship with methane production. The results suggested that the best method was PstI RE-RRS analyzed with the reference-free approach, which accounted for 53.3±5.9% of reads, and had repeatabilities of 0.49±0.07 and 0.50±0.07 for the first two principal components (PC1 and PC2), phenotypic correlations with methane yield of 0.43±0.06 and 0.46±0.06 for PC1 and PC2, and explained 41±8% of the variation in methane yield. These results were significantly better than for bacterial 16S rRNA gene sequencing of the same samples (p<0.05) except for the correlation between PC2 and methane yield. A Sensitivity study suggested approximately 2000 samples could be sequenced in a single lane on an Illumina HiSeq 2500, meaning the current work using 118 samples/lane and future proposed 384 samples/lane are well within that threshold. With minor adaptations, our approach could be used to obtain microbial profiles from other metagenomic samples.
BackgroundThe recent development of next-generation sequencing DNA marker technologies, such as genotyping-by-sequencing (GBS), generates thousands of informative single nucleotide polymorphism markers in almost any species, regardless of genomic resources. This enables poorly resourced or “orphan” crops/species access to high-density, high-throughput marker platforms which have revolutionised population genetics studies and plant breeding. DNA quality underpins success of GBS methods as the DNA must be amenable to restriction enzyme digestion and sequencing. A barrier to implementing GBS technologies is access to inexpensive, high-throughput extraction methods that yield sequencing-quality genomic DNA (gDNA) from plants. Several high-throughput DNA extraction methods are available, but typically provide low yield or poor quality gDNA, or are costly (US$6–$9/sample) for consumables.ResultsWe modified a non-organic solvent protocol to extract microgram quantities (1–13 μg) of sequencing-quality high molecular weight gDNA inexpensively in 96-well plates from either fresh, freeze-dried or silica gel-dried plant tissue. The protocol was effective for several easy and difficult-to-extract forage, crop, horticultural and common model species including Trifolium, Medicago, Lolium, Secale, Festuca, Malus, Oryza, and Arabidopsis. The extracted DNA was of high molecular weight and digested readily with restriction enzymes. Contrasting with other extraction protocols we assessed, Illumina-based sequencing of GBS libraries developed from this gDNA had very uniform high quality base-calls to the end of sequence reads. Furthermore, DNA extracted using this method has been sequenced successfully with the PacBio long-read platform. The protocol is scalable, readily automated without requirement for fume hoods, requires approximately three hours to process 192 samples (384–576 samples/day), and is inexpensive at US$0.62/sample for consumables.ConclusionsThis versatile, scalable and simple protocol yields high molecular weight genomic DNA suitable for restriction enzyme digestion and next-generation sequencing applications including GBS and long-read sequencing platforms such as PacBio. The low cost, high-throughput, and extraction of high quality gDNA from a range of fresh and dried source plant material makes this method suitable for many sequencing and genotyping applications including large-scale sample screening underpinning breeding programmes.
High-throughput sequencing methods provide a cost-effective approach for genotyping and are commonly used in population genetics studies. A drawback of these methods, however, is that sequencing and genotyping errors can arise...
Genotypes are often used to assign parentage in agricultural and ecological settings. Sequencing can be used to obtain genotypes but does not provide unambiguous genotype calls, especially when sequencing depth is low in order to reduce costs. In that case, standard parentage analysis methods no longer apply. A strategy for using low-depth sequencing data for parentage assignment is developed here. It entails the use of relatedness estimates along with a metric termed excess mismatch rate which, for parent-offspring pairs or trios, is the difference between the observed mismatch rate and the rate expected under a model of inheritance and allele reads without error. When more than one putative parent has similar statistics, bootstrapping can provide a measure of the relatedness similarity. Putative parent-offspring trios can be further checked for consistency by comparing the offspring’s estimated inbreeding to half the parent relatedness. Suitable thresholds are required for each metric. These methods were applied to a deer breeding operation consisting of two herds of different breeds. Relatedness estimates were more in line with expectation when the herds were analyzed separately than when combined, although this did not alter which parents were the best matches with each offspring. Parentage results were largely consistent with those based on a microsatellite parentage panel with three discordant parent assignments out of 1561. Two models are investigated to allow the parentage metrics to be calculated with non-random selection of alleles. The tools and strategies given here allow parentage to be assigned from low-depth sequencing data.
Comparative maps between ruminant species and humans are increasingly important tools for the discovery of genes underlying economically important traits. In this article we present a primary linkage map of the deer genome derived from an interspecies hybrid between red deer (Cervus elaphus) and Père David's deer (Elaphurus davidianus). The map is ~2500 cM long and contains >600 markers including both evolutionary conserved type I markers and highly polymorphic type II markers (microsatellites). Comparative mapping by annotation and sequence similarity (COMPASS) was demonstrated to be a useful tool for mapping bovine and ovine ESTs in deer. Using marker order as a phylogenetic character and comparative map information from human, mouse, deer, cattle, and sheep, we reconstructed the karyotype of the ancestral Pecoran mammal and identified the chromosome rearrangements that have occurred in the sheep, cattle, and deer lineages. The deer map and interspecies hybrid pedigrees described here are a valuable resource for (1) predicting the location of orthologs to human genes in ruminants, (2) mapping QTL in farmed and wild deer populations, and (3) ruminant phylogenetic studies.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.