Massively parallel sequencing technique, introduced by NGS technology, has resulted in an exponential growth of sequencing data, with greatly reduced cost and increased throughput. This huge explosion of data has introduced new challenges in regard to its storage, integration, processing and analyses. In this paper, we have proposed a novel distributed model under Map-Reduce paradigm to address the NGS big data problem. The architecture of the model involves Map-Reduce based modularized approach involving 3 different phases that support various analytical pipelines. The first phase will generate detailed base level information of various individual genomes, by granulating the alignment data. The other 2 phases independently process this base level information in parallel. One of these 2 phases will provide an integrated DNA profile of multiple individuals, whereas the other phase will generate contigs with similar features in an individual. Each of these 2 phases will generate a repository of genomic information that will facilitate other analytical pipelines. A simulated and real experimental prototypes has been provided as results to show the effectiveness of the model and its superiority over a few existing popular models and tools. A detailed description of the scope of applications of this model is also included in this article.
The North-eastern region of India encompasses eight states viz. Arunachal Pradesh, Assam, Manipur, Meghalaya, Mizoram, Nagaland, Sikkim and Tripura. It is perhaps the most extravagant store of medicinal plants in the World. It is considered that the medicinal plants are the foundation of the conventional medicine. Almost 80% of the world populaces depend on conventional medications for essential medical services, a large portion of which include the utilization of extracts of medicinal plants. This conventional practice of medication assumes a significant part in the medical care of rural people for a wide range of diseases. They practice their own traditional healthcare system as they have an in-depth knowledge and understanding about plants, both conventional and non-conventional for their food and for medicine. This review thus underlines the different medicinal plants found across the entire north-eastern region and their respective potential uses or medicinal plant utilization for the significant well-being of mankind.
Copy number variation (CNV) is a form of structural alteration in the mammalian DNA sequence, which are associated with many complex neurological diseases as well as cancer. The development of next generation sequencing (NGS) technology provides us a new dimension towards detection of genomic locations with copy number variations. Here we develop an algorithm for detecting CNVs, which is based on depth of coverage data generated by NGS technology. In this work, we have used a novel way to represent the read count data as a two dimensional geometrical point. A key aspect of detecting the regions with CNVs, is to devise a proper segmentation algorithm that will distinguish the genomic locations having a significant difference in read count data. We have designed a new segmentation approach in this context, using convex hull algorithm on the geometrical representation of read count data. To our knowledge, most algorithms have used a single distribution model of read count data, but here in our approach, we have considered the read count data to follow two different distribution models independently, which adds to the robustness of detection of CNVs. In addition, our algorithm calls CNVs based on the multiple sample analysis approach resulting in a low false discovery rate with high precision.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.