Jong Youl Choi scite author profile

SUMMARYApplications running on leadership platforms are more and more bottlenecked by storage input/output (I/O). In an effort to combat the increasing disparity between I/O throughput and compute capability, we created Adaptable IO System (ADIOS) in 2005. Focusing on putting users first with a service oriented architecture, we combined cutting edge research into new I/O techniques with a design effort to create near optimal I/O methods. As a result, ADIOS provides the highest level of synchronous I/O performance for a number of mission critical applications at various Department of Energy Leadership Computing Facilities. Meanwhile ADIOS is leading the push for next generation techniques including staging and data processing pipelines. In this paper, we describe the startling observations we have made in the last half decade of I/O research and development, and elaborate the lessons we have learned along this journey. We also detail some of the challenges that remain as we look toward the coming Exascale era.

show abstract

ADIOS 2: The Adaptable Input Output System. A framework for high-performance data management

Godoy

et al. 2020

View full text Add to dashboard Cite

Balancing auditability and privacy in vehicular networks

Choi

Jakobsson

Wetzel

2005

View full text Add to dashboard Cite

We investigate how to obtain a balance between privacy and audit requirements in vehicular networks. Challenging the current trend of relying on asymmetric primitives within VANETs, our investigation is a feasibility study of the use of symmetric primitives, resulting in some efficiency improvements of potential value. More specifically, we develop a realistic trust model, and an architecture that supports our solution. In order to ascertain that most users will not find it meaningful to disconnect or disable transponders, we design our solution with several types of user incentives as part of the structure. Examples of resulting features include anonymous toll collection; improved emergency response; and personalized and route-dependent traffic information.

show abstract

Dimension reduction and visualization of large high-dimensional data via interpolation

Bae

Choi

Qiu

et al. 2010

View full text Add to dashboard Cite

The recent explosion of publicly available biology gene sequences and chemical compounds offers an unprecedented opportunity for data mining. To make data analysis feasible for such vast volume and high-dimensional scientific data, we apply high performance dimension reduction algorithms. It facilitates the investigation of unknown structures in a three dimensional visualization. Among the known dimension reduction algorithms, we utilize the multidimensional scaling and generative topographic mapping algorithms to configure the given high-dimensional data into the target dimension. However, both algorithms require large physical memory as well as computational resources. Thus, the authors propose an interpolated approach to utilizing the mapping of only a subset of the given data. This approach effectively reduces computational complexity. With minor trade-off of approximation, interpolation method makes it possible to process millions of data points with modest amounts of computation and memory requirement. Since huge amount of data are dealt, we represent how to parallelize proposed interpolation algorithms, as well. For the evaluation of the interpolated MDS by STRESS criteria, it is necessary to compute symmetric all pairwise computation with only subset of required data per process, so we also propose a simple but efficient parallel mechanism for the symmetric all pairwise computation when only a subset of data is available to each process. Our experimental results illustrate that the quality of interpolated mapping results are comparable to the mapping Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. HPDC '10 Chicago, Illinois USA Copyright 2010 ACM X-XXXXX-XX-X/XX/XX ...$10.00. results of original algorithm only. In parallel performance aspect, those interpolation methods are well parallelized with high efficiency. With the proposed interpolation method, we construct a configuration of two-million out-of-sample data into the target dimension, and the number of out-of-sample data can be increased further.

show abstract

Understanding and Modeling Lossy Compression Schemes on HPC Scientific Data

Lü

Liu

et al. 2018

View full text Add to dashboard Cite

Cloud computing paradigms for pleasingly parallel biomedical applications

Gunarathne

Choi

et al. 2011

Concurrency and Computation

View full text Add to dashboard Cite

Cloud computing offers new approaches for scientific computing that leverage the major commercial hardware and software investment in this area. Closely coupled applications are still unclear in clouds as synchronization costs are still higher than on optimized MPI machines. However loosely coupled problems are very important in many fields and can achieve good cloud performance even when pleasingly parallel steps are followed by reduction operations as supported by MapReduce. However we can use clouds in several ways and here we compare four different approaches using two biomedical applications. We look at the cloud infrastructure service based virtual machine utility computing models of Amazon AWS and Microsoft Windows Azure; Map Reduce based computing frameworks Apache Hadoop (deployed on raw hardware as well as on virtual machines) and Micrsoft DryadLINQ. We compare performance showing strong variations in cost between different EC2 machine choices and comparable performance between the utility computing (spawn off a set of jobs) and managed parallelism (MapReduce). The MapReduce approach offered the most user friendly approach.

show abstract

Hybrid cloud and cluster computing paradigms for life science applications

et al. 2010

View full text Add to dashboard Cite

BackgroundClouds and MapReduce have shown themselves to be a broadly useful approach to scientific computing especially for parallel data intensive applications. However they have limited applicability to some areas such as data mining because MapReduce has poor performance on problems with an iterative structure present in the linear algebra that underlies much data analysis. Such problems can be run efficiently on clusters using MPI leading to a hybrid cloud and cluster environment. This motivates the design and implementation of an open source Iterative MapReduce system Twister.ResultsComparisons of Amazon, Azure, and traditional Linux and Windows environments on common applications have shown encouraging performance and usability comparisons in several important non iterative cases. These are linked to MPI applications for final stages of the data analysis. Further we have released the open source Twister Iterative MapReduce and benchmarked it against basic MapReduce (Hadoop) and MPI in information retrieval and life sciences applications.ConclusionsThe hybrid cloud (MapReduce) and cluster (MPI) approach offers an attractive production environment while Twister promises a uniform programming environment for many Life Sciences applications.MethodsWe used commercial clouds Amazon and Azure and the NSF resource FutureGrid to perform detailed comparisons and evaluations of different approaches to data intensive computing. Several applications were developed in MPI, MapReduce and Twister in these different environments.

show abstract

Fast and Black-box Exploit Detection and Signature Generation for Commodity Software

Choi

et al. 2008

ACM Trans. Inf. Syst. Secur.

View full text Add to dashboard Cite

In biology, a vaccine is a weakened strain of a virus or bacterium that is intentionally injected into the body for the purpose of stimulating antibody production. Inspired by this idea, we propose a packet vaccine mechanism that randomizes address-like strings in packet payloads to carry out fast exploit detection and signature generation. An exploit with a randomized jump address behaves like a vaccine: it will likely cause an exception in a vulnerable program's process when attempting to hijack the control flow, and thereby expose itself. Taking that exploit as a template, our signature generator creates a set of new vaccines to probe the program in an attempt to uncover the necessary conditions for the exploit to happen. A signature is built upon these conditions to shield the underlying vulnerability from further attacks. In this way, packet vaccine detects exploits and generates signatures in a black-box fashion, that is, not relying on the knowledge of a vulnerable program's source and binary code. Therefore, it even works on the commodity software obfuscated for the purpose of copyright protection. In addition, since our approach avoids the . Authors' addresses: X. Wang, Z. Li, and J. Y. Choi, Indiana University; email: Zhuowei.li@ microsoft,com; J. Xu, Google Inc. and North Carolina State University; M. K. Reiter, University of North Carolina at Chapel Hill; C. Kil, North Carolina State University. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or direct commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credits is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works requires prior specific permission and/or a fee. Permissions may be requested from the Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212) © ACM, 2008. This is the authors' version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version is available at http://doi.acm.org/10.1145/1455518.1455523. 11: 2· X. Wang et al.expense of tracking the program's execution flow, it performs almost as fast as a normal run of the program and is capable of generating a signature of high quality within seconds or even subseconds. We present the design of the packet vaccine mechanism and an example of its application. We also describe our proof-of-concept implementation and the evaluation of our technique using real exploits.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Jong Youl Choi

Hello ADIOS: the challenges and lessons of developing leadership class I/O frameworks

ADIOS 2: The Adaptable Input Output System. A framework for high-performance data management

Balancing auditability and privacy in vehicular networks

Dimension reduction and visualization of large high-dimensional data via interpolation

Understanding and Modeling Lossy Compression Schemes on HPC Scientific Data

Cloud computing paradigms for pleasingly parallel biomedical applications

Hybrid cloud and cluster computing paradigms for life science applications

Fast and Black-box Exploit Detection and Signature Generation for Commodity Software

Contact Info

Product

Resources

About