Application of the massive parallel sequencing technology has become one of the most important issues in life sciences. Therefore, it was crucial to develop bioinformatics tools for next-generation sequencing (NGS) data processing. Currently, two of the most significant tasks include alignment to a reference genome and detection of single nucleotide polymorphisms (SNPs). In many types of genomic analyses, great numbers of reads need to be mapped to the reference genome; therefore, selection of the aligner is an essential step in NGS pipelines. Two main algorithms-suffix tries and hash tables-have been introduced for this purpose. Suffix array-based aligners are memory-efficient and work faster than hash-based aligners, but they are less accurate. In contrast, hash table algorithms tend to be slower, but more sensitive. SNP and genotype callers may also be divided into two main different approaches: heuristic and probabilistic methods. A variety of software has been subsequently developed over the past several years. In this paper, we briefly review the current development of NGS data processing algorithms and present the available software.
Less than 2% of mammalian genomes code for proteins, but ‘the majority of its bases can be found in primary transcripts’ – a phenomenon termed the pervasive transcription, which was first reported in 2007. Even though most of the transcripts do not code for proteins, they play a variety of biological functions, with regulation of gene expression appearing as the most common one. Those transcripts are divided into two groups based on their length: small non-coding RNAs, which are maximally 200 bp long, and long non-coding RNAs (lncRNAs), which are longer than 200 nucleotides. The advances in next-generation sequencing methods provided a new possibility of investigating the full set of RNA molecules in the cell. In this review, we summarized the current state of knowledge on lncRNAs in three major livestock species – Sus scrofa, Bos taurus and Gallus gallus, based on the literature and the content of biological databases. In the NONCODE database, the largest number of identified lncRNA transcripts is available for pigs, but cattle have the largest number of lncRNA genes. Poultry is represented by less than a half of records. Genomic annotation of lncRNAs showed that the majority of them are assigned to introns (pig, poultry) or intergenic (cattle). The comparison with well-annotated human and mouse genomes indicates that such annotation is a result of lack of proper lncRNA annotation data. Since lncRNAs play an important role in genomic studies, their characterization in farm animals’ genomes is critical in bridging the gap between genotype and phenotype.
BackgroundThe number of studies of Copy Number Variation in cattle has increased in recent years. This has been prompted by the increased availability of data on polymorphisms and their relationship with phenotypes. In addition, livestock species are good models for some human phenotypes. In the present study, we described the landscape of CNV driven genetic variation in a large population of 146 individuals representing 13 cattle breeds, using whole genome DNA sequence.ResultsA highly significant variation among all individuals and within each breed was observed in the number of duplications (P < 10−15) and in the number of deletions (P < 10−15). We also observed significant differences between breeds for duplication (P = 0.01932) and deletion (P = 0.01006) counts. The same variation CNV length - inter-individual and inter-breed differences were significant for duplications (P < 10−15) and deletions (P < 10−15). Moreover, breed-specific variants were identified, with the largest proportion of breed-specific duplications (9.57%) found for Fleckvieh and breed-specific deletions found for Brown Swiss (5.00%). Such breed-specific CNVs were predominantly located in intragenic regions, however in Simmental, one deletion present in five individuals was found in the coding sequence of a novel gene ENSBTAG00000000688 on chromosome 18. In Brown Swiss, Norwegian Red and Simmental breed-specific deletions were located within KIT and MC1R genes, which are responsible for a coat colour. The functional annotation of coding regions underlying the breed-specific CNVs showed that in Norwegian Red, Guernsey, and Simmental significantly under- and overrepresented GO terms were related to chemical stimulus involved in sensory perception of smell and the KEGG pathways for olfactory transduction. In addition, specifically for the Norwegian Red breed, the dopaminergic synapse KEGG pathway was significantly enriched within deleted parts of the genome.ConclusionsThe CNV landscape in Bos taurus genome revealed by this study was highly complex, with inter-breed differences, but also a significant variation within breeds. The former, may explain some of the phenotypic differences among analysed breeds, and the latter contributes to within-breed variation available for selection.Electronic supplementary materialThe online version of this article (10.1186/s12864-018-4815-6) contains supplementary material, which is available to authorized users.
Mastitis is an inflammatory disease of the mammary gland, which has a significant economic impact and is an animal welfare concern. This work examined the association between single nucleotide polymorphisms (SNPs) and copy number variations (CNVs) with the incidence of clinical mastitis (CM). Using information from 16 half-sib pairs of Holstein-Friesian cows (32 animals in total) we searched for genomic regions that differed between a healthy (no incidence of CM) and a mastitis-prone (multiple incidences of CM) half-sib. Three cows with average sequence depth of coverage below 10 were excluded, which left 13 half-sib pairs available for comparisons. In total, 191 CNV regions were identified, which were deleted in a mastitis-prone cow, but present in its healthy half-sib and overlapped in at least nine half-sib pairs. These regions overlapped with exons of 46 genes, among which APP (BTA1), FOXL2 (BTA1), SSFA2 (BTA2), OTUD3 (BTA2), ADORA2A (BTA17), TXNRD2 (BTA17) and NDUFS6 (BTA20) have been reported to influence CM. Moreover, two duplicated CNV regions present in nine healthy individuals and absent in their mastitis-affected half-sibs overlapped with exons of a cholinergic receptor nicotinic α 10 subunit on BTA15 and a novel gene (ENSBTAG00000008519) on BTA27. One CNV region deleted in nine mastitis-affected sibs overlapped with two neighbouring long non-coding RNA sequences located on BTA12. Single nucleotide polymorphisms with differential genotypes between a healthy and a mastitis-affected sib included 17 polymorphisms with alternate alleles in eight affected and healthy half-sib families. Three of these SNPs were located introns of genes: MET (BTA04), RNF122 (BTA27) and WRN (BTA27). In summary, structural polymorphisms in form of CNVs, putatively play a role in susceptibility to CM. Specifically, sequence deletions have a greater effect on reducing resistance against mastitis, than sequence duplications have on increasing resistance against the disease.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.