Background: Mobile genetic elements (MGEs) comprise a major portion of the human genome and are essential for genetic diversity. These elements are known to have the capability to induce mutations in the human genome. To date, there are several MGE insertions which have been reported to be associated with cancer. We aim to use genome next-generation sequencing data and appropriate bioinformatics tools to accurately identify the insertion sites of MGEs in the human genome.Results: Herein, we introduce the MeX pipeline for the localization and annotation of MGEs in paired-end sequencing data. It requires the reference genome sequence, MGE sequences and paired-end sequencing reads. We evaluated MeX on high depth (>75×) Illumina HiSeq data produced at the Broad Institute (NA12878) against human genome 38-built (including only chromosome 1, 2 and 3) and Alu elements. We could identify 78 reference and 1 non-reference Alu insertions in the NA12878 sample. Upon annotation, it was found that the non-reference Alu element was in the 3' UTR region of the RNF2 gene. Out of 78 reference insertions, 42 were in the intronic region, 7 in the upstream region, 5 in the downstream region, 1 in the 3’ UTR region and the rest were not associated with any gene. MeX showed high performance for the identification and annotation of MGEs in genome samples.Conclusion: This study showed that MeX is a robust and powerful tool for the identification and annotation of MGE insertions. It may also serve as a valuable tool to study the phenotypic changes resulting from transpositional events in cancer genomics.
Biological data is a new era with new growth in numerical and memory retention capacity, many microbial and eukaryotic genomes encapsulate the human genome's pure structure, followed by raising the prospect of higher viral control. The goal is as high as the development of drug development based on the study of the structures and functions of target molecules (rational drug) and antimicrobial agents, the growth is simple to manage drugs, protein biomarkers that develop different bacterial infections and healthier considerate of protein(host)-protein(bacteria) interactions to avert bacterial disease. In addition to many bioinformatics processes and cross-reference, databases have made easy the understanding of these goals. The current study is divided into (I) genomics - sequencing and gene-related studies to determine the genetic function and genetic engineering, (II) proteomics - classification of associated properties of protein and rebuilding of the metabolic and regulatory pathway, (III) growth of drug and antimicrobial agents' application. Our center of attention on genomics and proteomics strategies and their restrictions in the current chapter. Bioinformatics study can be grouped under several main criteria: (1) research-based on existing wet-lab testing data, (2) new data obtained from the use of mathematical modelling and (3) an incorporated method that combines exploration procedure with a mathematical model. The main implications of bioinformatics examined area have automated genetic sequence, robotic expansion of integrated data of genomics and proteomics, computer-assisted comparison to find genome utility, the automatic origin of a metabolic pathway, gene expression analysis which was derived from the regulatory pathway, clustering techniques and strategies of data mining to identify the interaction of protein-protein and protein-DNA and silico modelling of three-dimensional protein arrangement and docking between proteins and biological chemicals for rational drug design, investigation of differences among infectious and non-infectious species to recognise genes drugs and antimicrobial agents and all genome comparisons to be aware of the development of microorganisms. Advanced bioinformatics has the potential to help (i) cause disease detection, (ii) develop new drugs and (iii) improve cost-effective bioremediation agents. Recent research is a part of the lack of genetic functionality found in wet laboratories information, the absence of computer algorithms to test large amounts of information on unidentified function and the continuous discovery of protein-to-protein, protein-to-DNA and Protein to RNA interaction.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.