In the present study, we describe the deep sequencing and structural analysis of the Holstein breed bull genome. Our aim was to receive a high-quality Holstein bull genome reference sequence and to describe different types of variations in its genome compared to Hereford breed as a reference. We generated four mate-paired libraries and one fragment library from 30 μg of genomic DNA. Colour space fasta were mapped and paired to the reference cow (Bos taurus) genome assembly from Oct. 2011 (Baylor 4.6.1/bosTau7). Initial sequencing resulted in the 4,864,054,296 of 50-bp reads. Average mapping efficiency was 71.7 % and altogether 3,494,534,136 reads and 157,928,163,086 bp were successfully mapped, resulting in 60 × coverage. This is the highest coverage for bovine genome published so far. Tertiary analysis found 6,362,988 SNPs in the bull's genome, 4,045,889 heterozygous and 2,317,099 homozygous variants. Annotation revealed that 4,330,337 of all discovered SNPs were annotated in the dbSNP database (build 137) and therefore 2,032,651 SNPs were novel. Large indel variations accounted for the 245,947,845 bp of the variation in entire genome and their number was 312,879. We also found that small indels (number was 633,310) accounted for the total variation of 2,542,552 nucleotides in the genome. Only 106,768 small indels were listed in the dbSNP. Finally, we identified 2,758 inversions in the genome of the bull covering in total 23,099,054 bp of genome's variation. The largest inversion was 87,440 bp in size. In conclusion, the present study discovered different types of novel variants in bull's genome after high-coverage sequencing. Better knowledge of the functions of these variations is needed.
The aim of our study was to create a high-quality Holstein cow genome reference sequence and describe the different types of variations in this genome compared to the reference Hereford breed. We generated one fragment and three mate-paired libraries from genomic DNA. Raw files were mapped and paired to the reference cow (Bos taurus) genome assemblies bosTau6/UMD_3.1. BioScope (v1.3) software was used for mapping and variant analysis. Initial sequencing resulted in 2,842,744,008 of 50-bp reads. Average mapping efficiency was 78.4 % and altogether 2,168,425,497 reads and 98,022,357,422 bp were successfully mapped, resulting in 36.7X coverage. Tertiary analysis found 5,923,230 SNPs in the bovine genome, of which 3,833,249 were heterozygous and 2,089,981 were homozygous variants. Annotation revealed that 4,241,000 of all discovered SNPs were annotated in the dbSNP database and 1,682,230 SNPs were considered as novel. Large indel variations accounted for 48,537,190 bp of the entire genome and there were 138,504 of them. The largest deletion was 18,594 bp and the largest insertion was 13,498 bp. Another group of variants, small indels (n = 458,061), accounted for the total variation of 1,839,872 nucleotides in the genome. Only 92,115 small indels were listed in the dbSNP and therefore 365,946 small indels were novel. Finally, we identified 1,876 inversions in the bovine genome. In conclusion, this is another description of the Holstein cow genome and, similar to previous studies, we found a large amount of novel variations. Better knowledge of these variations could explain significant phenotypic differences (e.g., health, production, reproduction) between different breeds.
This paper presents the preliminary results of whole genome resequencing of the Holstein cow using the SOLiD 4 System. The aim of this study was to obtain a high-quality Holstein cow genome reference sequence, which could be used as a reference for genomic studies on the Estonian Holstein cattle. Furthermore, the new reference sequence would be made available for other research groups. We generated one mate-paired library and one fragment library from 30 μg of genomic DNA. Libraries were sequenced in 4 flow cells. Colour space fasta files (.csfasta) and appropriate quality files (.qual) were mapped and paired to the reference cow (Bos taurus) genome assembly from Oct. 2007 (Baylor 4.0/bosTau4). Mapping and pairing was performed using the Max Mapper algorithm implemented in the Bioscope Software (version 1.3). Initial sequencing resulted in the 2 842 744 008 fifty-basepair reads. Average mapping efficiency with mismatch penalty –2.00 and clearzone 5 was 73.3%. Altogether 2 065 066 215 reads and 92 778 710 937 bp were successfully mapped, resulting in 35.2 coverage. Pairing indicated that the insert range was 665 to 2195 bp and mean insert size was 1363 bp. Tertiary analysis found 5 472 870 SNP in the cow genome; 3 517 351 were heterozygous and 1 955 519 were homozygous variants. Also, 3 747 199 were transition SNP and 1 093 307 were transversion SNP, with a transition-transversion ratio of 2.17:1.00. Annotation revealed that only 889 901 of all discovered SNP were annotated in the SNP database dbSNP. This means that around 4 582 969 SNP were novel. The number of large indels was 144 035, out of which 68 817 were heterozygous and 75 218 were homozygous variants. The longest deletion was 15 089 bp and there were 18 deletions between 10 000 and 20 000 bp. The largest insertion range was 1000 to 5000 bp and there were 358 insertions falling into this span. Interestingly, the most numerous group of deletions was between 200 and 500 bp and between 100 and 200 bp. Altogether, in these size groups there were 114 578 deletions. Large indels variations accounted for 48 582 675 bp of the entire genome. Analysis of the small indel polymorphisms identified 452 113 small indels, out of which 287 491 were heterozygous and 164 622 were homozygous. Only 1197 small indels were listed in the dbSNP. Most of the small indels were single nucleotide insertions/deletions (261 897). Small indels accounted for the total variation of 1 722 303 nucleotides in the genome. Finally, we identified 287 inversions (largest 151 000 bp) in the genome of the cow. In conclusion, the genome of the cow contains huge amounts of still unknown variations. Better knowledge of these variations could explain significant phenotypic differences (e.g. reproduction) between different breeds. The European Regional Development Fund together with the Archimedes Foundation, target finance grant from the Ministry of Education and Science SF1080045s07, grant from the Estonian University of Life Sciences P8001 and Estonian Science Foundation grant GARFS7479 supported this study.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.