Nucleosomes regulate base excision repair in chromatin

As computational power and storage capacity increase, processing and analyzing large volumes of multi-dimensional datasets play an increasingly important part in many domains of scienti c research. Our study of a large set of scienti c applications over the past three years indicates that the processing for such datasets is often highly stylized and shares several important characteristics. Usually, both the input dataset as well as the result being computed have underlying multi-dimensional grids. The basic processing step usually consists of transforming individual input items, mapping the transformed items to the output grid and computing output items by aggregating, in some way, all the transformed input items mapped to the corresponding grid point. In this paper, we present the design of T2, a customizable parallel database that integrates storage, retrieval and processing of multi-dimensional datasets. T2 provides support for common operations including index generation, data retrieval, memory management, scheduling of processing across a parallel machine and user interaction. It achieves its primary advantage from the ability to seamlessly integrate data retrieval and processing for a wide variety of applications and from the ability to maintain and jointly process multiple datasets with di erent underlying grids. We also present some preliminary performance results comparing the implementation of a remote-sensing image database using the T2 services with a custom-built integrated implementation.

show abstract

Coordinating the use of GPU and CPU for improving performance of compute intensive applications

Teodoro

Sachetto

Sertel

et al. 2009

View full text Add to dashboard Cite

G-DBSCAN: A GPU Accelerated Algorithm for Density-based Clustering

Andrade¹,

Ramos²,

Madeira³

et al. 2013

Procedia Computer Science

115

View full text Add to dashboard Cite

Anthill: A Scalable Run-Time Environment for Data Mining Applications

Ferreira

Meira

Guedes

et al.

View full text Add to dashboard Cite

show abstract

Querying very large multi-dimensional datasets in ADR

Kurç

Chang

Ferreira

et al. 1999

View full text Add to dashboard Cite

Applications that make use of very large scientific datasets have become an increasingly important subset of scientific applications. In these applications, datasets are often multi-dimensional, i.e., data items are associated with points in a multi-dimensional attribute space, and access to data items is described by range queries. The basic processing involves mapping input data items to output data items, and some form of aggregation of all the input data items that project to the each output data item. We have developed an infrastructure, called the Active Data Repository (ADR), that integrates storage, retrieval and processing of multi-dimensional datasets on distributed-memory parallel architectures with multiple disks attached to each node. In this paper we address efficient execution of range queries on distributed memory parallel machines within ADR framework. We present three potential strategies, and evaluate them under different application scenarios and machine configurations. We present experimental results on the scalability and performance of the strategies on a 128-node IBM SP.

show abstract

Processing large-scale multi-dimensional data in parallel and distributed environments

et al. 2002

View full text Add to dashboard Cite

Run-time optimizations for replicated dataflows on heterogeneous environments

Teodoro

Hartley

Çatalyürek

et al. 2010

View full text Add to dashboard Cite

The Supramap project: linking pathogen genomes with geography to fight emergent infectious diseases

Janies

Treseder

Alexandrov

et al. 2011

View full text Add to dashboard Cite

Novel pathogens have the potential to become critical issues of national security, public health and economic welfare. As demonstrated by the response to Severe Acute Respiratory Syndrome (SARS) and influenza, genomic sequencing has become an important method for diagnosing agents of infectious disease. Despite the value of genomic sequences in characterizing novel pathogens, raw data on their own do not provide the information needed by public health officials and researchers. One must integrate knowledge of the genomes of pathogens with host biology and geography to understand the etiology of epidemics. To these ends, we have created an application called Supramap (http://supramap.osu.edu) to put information on the spread of pathogens and key mutations across time, space and various hosts into a geographic information system (GIS). To build this application, we created a web service for integrated sequence alignment and phylogenetic analysis as well as methods to describe the tree, mutations, and host shifts in Keyhole Markup Language (KML). We apply the application to 239 sequences of the polymerase basic 2 (PB2) gene of recent isolates of avian influenza (H5N1). We map a mutation, glutamic acid to lysine at position 627 in the PB2 protein (E627K), in H5N1 influenza that allows for increased replication of the virus in mammals. We use a statistical test to support the hypothesis of a correlation of E627K mutations with avian-mammalian host shifts but reject the hypothesis that lineages with E627K are moving westward. Data, instructions for use, and visualizations are included as supplemental materials at: http://supramap.osu.edu/sm/supramap/publications. Ó The Willi Hennig Society 2010.We have created a web-based workflow application, Supramap (http://supramap.osu.edu). Using a web browser, a user inputs text files containing sequence and or phenotypic data, latitude and longitude coordinates, and (optionally) a date of isolation for each strain. Our application then executes a workflow that entails integrated sequence alignment and phylogenetic analysis, computation of character changes (e.g., mutations and host shifts), and geographical projection of the tree on a computing cluster. Once the analyses are complete, the user can download a phylogenetic layer expressed in KML file and view the file with a Geographic Information System (GIS). The user can use the phylogenetic layer to visualize several aspects of pathogen evolution including: spread of lineages, mutations, shifts among hosts, and phenotypic changes over geography and time. We illustrate the use of the system with a case study on H5N1 and discuss use of visualization in conjunction with statistical validation. Other tree projection effortsSupramap is superficially similar to other efforts for projecting phylogenetic trees in GIS, such as

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Renato Ferreira

Infrastructure for building parallel database systems for multi-dimensional data

Coordinating the use of GPU and CPU for improving performance of compute intensive applications

G-DBSCAN: A GPU Accelerated Algorithm for Density-based Clustering

Anthill: A Scalable Run-Time Environment for Data Mining Applications

Querying very large multi-dimensional datasets in ADR

Processing large-scale multi-dimensional data in parallel and distributed environments

Run-time optimizations for replicated dataflows on heterogeneous environments

The Supramap project: linking pathogen genomes with geography to fight emergent infectious diseases

Contact Info

Product

Resources

About