Parallel Processing of Large-Scale XML-Based Application Documents on Multi-core Architectures with PiXiMaL

Head, Michael R.; Govindaraju, Madhusudhan

doi:10.1109/escience.2008.77

Cited by 6 publications

(9 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Recent benchmarking works in [32,33] demonstrate that most existing implementations of WS do not scale well when the size of the SOAP/XML document being processed is increased. The authors in [32,33]argue that most existing software toolkitsare typically designed to process small-sized XML datasets, and thus are not suited for large-scale comptuging applications, e.g., [25,62].Hence, recent studies have attempted to alleviate the limitations of XML software performance bottlenecks by applying nontraditional parallel processor architectures, e.g., [8,23,30,36,55,78]. On one hand, general-purpose (scalar) processorsare characterized by the sequential nature of instruction execution, where instructions are selected based on their sequential memory addresses, conditions being evaluated one at a time.…”

Section: Parallelization and Hardware Approachesmentioning

confidence: 99%

“…Handling XMLstreams entirely in software (for instance, by mapping processing pipeline stages to software threads) prevents the execution speed to be improved beyond a best processing rate of tens of clock cycles per character, and that best case performance can result in rates on the order of hundreds of clock cycles per character for many practical XML applications [78]. As a result, recent studies have addressed these performance bottlenecks by investigating non-traditional processors, namely parallel processing architectures and ‚XML ma-chines‛, e.g., [8,23,30].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

SOAP Processing Performance and Enhancement

Tekli

Damiani

Chbeir

et al. 2012

IEEE Trans. Serv. Comput.

View full text Add to dashboard Cite

The Web Services (WS) technology provides a comprehensive solution for representing, discovering and invoking services in a wide variety of environments, including SOA (Service Oriented Architectures) and grid computing systems. At the core of WS technology lie a number of XML-based standards, such as the Simple Object Access Protocol (SOAP), that have successfully ensured WS extensibility, transparency, and interoperability. Nonetheless, there is an increasing demand to enhance WS performance, which is severely impaired by XML's verbosity. SOAP communications produce considerable network traffic, making them unfit for distributed, loosely coupled and heterogeneous computing environments such as the open Internet. Also, they introduce higher latency and processing delays than other technologies, like Java RMI and CORBA. WS research has recently focused on SOAP performance enhancement.Many approaches build on the observation that SOAP message exchange usually involves highly similar messages (those created by the same implementation usually have the same structure, and those sent from a server to multiple clients tend to show similarities in structure and content). Similarity evaluation and differential encoding have thus emerged as SOAP performance enhancement techniques. The main idea is to identify the common parts of SOAP messages, to be processed only once, avoiding a large amount of overhead. Other approaches investigate non-traditional processor architectures, including micro-and macro-level parallel processing solutions, so as further increase the processing rates of SOAP/XML software toolkits.Thissurvey paper provides a concise, yet comprehensive review of the research efforts aimed at similarity-based SOAP performance enhancement. A unified view of the SOAP performance enhancement problem is provided, covering almost every phase of SOAP processing, ranging over message parsing, serialization, de-serialization, compression, multicasting, security evaluation, and data/instruction-level processing.

show abstract

Section: Parallelization and Hardware Approachesmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

SOAP Processing Performance and Enhancement

Tekli

Damiani

Chbeir

et al. 2012

IEEE Trans. Serv. Comput.

View full text Add to dashboard Cite

show abstract

“…In our previous work in this area, we focused on statescalability for the parser and the memory requirements for arrays of primitives when multiple threads operate concurrently to read large input files [18]. One related project by Pan et.…”

Section: Related Work In Xml Processingmentioning

confidence: 99%

Parallel and distributed approach for processing large-scale XML datasets

Fadika

Head

Govindaraju

2009

2009 10th IEEE/ACM International Conference on Grid Computing

Self Cite

View full text Add to dashboard Cite

Abstract-An emerging trend is the use of XML as the data format for many distributed scientific applications, with the size of these documents ranging from tens of megabytes to hundreds of megabytes. Our earlier benchmarking results revealed that most of the widely available XML processing toolkits do not scale well for large sized XML data. A significant transformation is necessary in the design of XML processing for scientific applications so that the overall application turn-around time is not negatively affected. We present both a parallel and distributed approach to analyze how the scalability and performance requirements of large-scale XML-based data processing can be achieved. We have adapted the Hadoop implementation to determine the threshold data sizes and computation work required per node, for a distributed solution to be effective. We also present an analysis of parallelism using our PIXIMAL toolkit for processing large-scale XML datasets that utilizes the capabilities for parallelism that are available in the emerging multi-core architectures. Multi-core processors are expected to be widely available in research clusters and scientific desktops, and it is critical to harness the opportunities for parallelism in the middleware, instead of passing on the task to application programmers. Our parallelization approach for a multi-core node is to employ a DFA-based parser that recognizes a useful subset of the XML specification, and convert the DFA into an NFA that can be applied to an arbitrary subset of the input. Speculative NFAs are scheduled on available cores in a node to effectively utilize the processing capabilities and achieve overall performance gains. We evaluate the efficacy of this approach in terms of potential speedup that can be achieved for representative XML data sets.

show abstract

“…A thorough description and analysis of the effective memory bandwidth of the PIXIMAL approach is presented in another venue [9]. In this section, we present a summary of research findings on these two topics.…”

Section: Memory Bandwidth and State-scalabilitymentioning

confidence: 99%

“…While this technique works well for serial processing, it is not tailored for processing on multi-core nodes, especially for very large document sizes. In our previous work in this area, we focused just on the memory bandwidth in multi-core architectures when multiple threads operate concurrently to read large input files [9].…”

Section: Related Workmentioning

confidence: 99%

Performance enhancement with speculative execution based parallelism for processing large-scale xml-based application data

Head

Govindaraju

2009

Proceedings of the 18th ACM International Symposium on High Performance Distributed Computing

Self Cite

View full text Add to dashboard Cite

We present the design and implementation of a toolkit for processing large-scale XML datasets that utilizes the capabilities for parallelism that are available in the emerging multi-core architectures. Multi-core processors are expected to be widely available in research clusters and scientific desktops, and it is critical to harness the opportunities for parallelism in the middleware, instead of passing on the task to application programmers. An emerging trend is the use of XML as the data format for many distributed/grid applications, with the size of these documents ranging from tens of megabytes to hundreds of megabytes. Our earlier benchmarking results revealed that most of the widely available XML processing toolkits do not scale well for large sized XML data. A significant transformation is necessary in the design of XML processing for distributed applications so that the overall application turn-around time is not negatively affected by XML processing. We discuss XML processing using PiXiMaL, a parallel processing library for large-scale XML datasets. The parallelization approach is to build a DFA-based parser that recognizes a useful subset of the XML specification, and convert the DFA into an NFA that can be applied to an arbitrary subset of the input. Speculative NFAs are scheduled on available cores in a node to effectively utilize the processing capabilities and achieve overall performance gains. We evaluate the efficacy of this approach in terms of potential speedup that can be achieved for representative XML datasets. We also evaluate the effect of two different memory allocation libraries to quantify the memory-bottleneck as different cores access shared data structures.

show abstract

Parallel Processing of Large-Scale XML-Based Application Documents on Multi-core Architectures with PiXiMaL

Cited by 6 publications

References 15 publications

SOAP Processing Performance and Enhancement

SOAP Processing Performance and Enhancement

Parallel and distributed approach for processing large-scale XML datasets

Performance enhancement with speculative execution based parallelism for processing large-scale xml-based application data

Contact Info

Product

Resources

About