Previous research addressed the potential problems of the harddisk oriented design of DBMSs of flashSSDs. In this paper, we focus on exploiting potential benefits of flashSSDs. First, we examine the internal parallelism issues of flashSSDs by conducting benchmarks to various flashSSDs. Then, we suggest algorithm-design principles in order to best benefit from the internal parallelism. We present a new I/O request concept, called psync I/O that can exploit the internal parallelism of flashSSDs in a single process. Based on these ideas, we introduce B+-tree optimization methods in order to utilize internal parallelism. By integrating the results of these methods, we present a B+-tree variant, PIO B-tree. We confirmed that each optimization method substantially enhances the index performance. Consequently, PIO B-tree enhanced B+-tree's insert performance by a factor of up to 16.3, while improving point-search performance by a factor of 1.2. The range search of PIO B-tree was up to 5 times faster than that of the B+-tree. Moreover, PIO B-tree outperformed other flashaware indexes in various synthetic workloads. We also confirmed that PIO B-tree outperforms B+-tree in index traces collected inside the Postgresql DBMS with TPC-C benchmark.
After genome project in 1990s, researches which are involved with gene have been progressed. These studies unearthed that gene is cause of disease, and relations between gene and disease are important. In this reason, we proposed a strategy called TILD that identifies cancer-related genes using title information in literature data. To implement our method, we selected cancerspecific literature data from the online database. We then extracted genes using text mining. In the next step, we classified into two kinds for extracted genes using title information. If genes are located in title, then they are classified as hub genes. In the contrast, if genes are located in body, then they are classified as sub genes which are connected with hub genes. We iterated the processes for each paper to construct the cancer-specific local gene network. In the last step, we constructed global cancerspecific gene network by integrating all local gene network, and calculated a score for each gene based on analysis of the global gene network. We assumed that genes in title have meaningful relations with cancer, and other genes in the body are related with the title genes. For validation, we compared with other methods for the top 20 genes inferred by each approach. Our approach found more cancer-related genes than comparable methods. METHODSWe proposed a method to identify cancer-genes using title information in literature data. We obtained abstract data from the PubMed. After preprocessing of the abstract data, we extracted genes in the literature. We then constructed local cancer gene network based on genes which are extracted by text-mining results. If the location of gene is title, then the gene is used as hub gene in local gene network. Otherwise, if the location of gene is text, then the gene is used as sub gene which is linked by hub gene. If the title of literature does not contain genes, then the literature is not used. The relations between hub genes and sub genes have weight. The weight is calculated based on location and frequency. The process of constructing local gene network is implemented for each paper. After constructing local gene network, we integrated all local gene networks to make global cancer gene network. Scoring FunctionWe assumed that gene which is located in title has meaningful relations with cancer. We calculated a weight for each relation between genes using the frequency and location information. If title of literature has multi-genes then the relations between these genes have largest weight. On the contrary, the relations between sub genes which are included in text are removed in local gene network. The weight of relations between hub genes and sub genes is proportion to the frequency of sub gene. The frequency means the number of gene which is appeared in literature. The weight is calculated as follows: weight h, h 1 weight h, s weight s, s 0Here, h denotes the hub gene, and s denotes the sub gene. N indicates the number of all genes which are appeared in text area for each paper. The frequency(s)...
Cities are vulnerable to a range of disasters that can occur simultaneously due to their complexity. Therefore, an effective disaster response plan is needed to reduce the disaster vulnerability of cities. In particular, evacuation route management is important for reducing the losses from a disaster. Efficient disaster response can be realized by searching for suitable evacuation routes and effective road network management. In this paper, we propose a disaster response framework based on a multilayered road network structure and evacuation routes based on our road network. The suggested road structure consists of three layers for the effective management of the network. An A* algorithm-based search for multiple evacuation routes under different conditions in response to an individual disaster on the configured road map provides a safe route for evacuees. As a result, the damage caused by disasters in urban areas can be ameliorated.
As the size of networks increases, it is becoming important to analyze large-scale network data. A network clustering algorithm is useful for analysis of network data. Conventional network clustering algorithms in a single machine environment rather than a parallel machine environment are actively being researched. However, these algorithms cannot analyze large-scale network data because of memory size issues. As a solution, we propose a network clustering algorithm for large-scale network data analysis using Apache Spark by changing the paradigm of the conventional clustering algorithm to improve its efficiency in the Apache Spark environment. We also apply optimization approaches such as Bloom filter and shuffle selection to reduce memory usage and execution time. By evaluating our proposed algorithm based on an average normalized cut, we confirmed that the algorithm can analyze diverse large-scale network datasets such as biological, co-authorship, internet topology and social networks. Experimental results show that the proposed algorithm can develop more accurate clusters than comparative algorithms with less memory usage. Furthermore, we confirm the proposed optimization approaches and the scalability of the proposed algorithm. In addition, we validate that clusters found from the proposed algorithm can represent biologically meaningful functions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.