Parsing is an expensive operation that can degrade XML processing performance. A survey of four representative XML parsing models-DOM, SAX, StAX, and VTD-reveals their suitability for different types of applications. Broadly used in database and networking applications, the Extensible Markup Language is the de facto standard for the interoperable document format. As XML becomes widespread, it is critical for application developers to understand the operational and performance characteristics of XML processing. As Figure 1 shows, XML processing occurs in four stages: parsing, access, modification, and serialization. Although parsing is the most expensive operation, 1 there are no detailed studies that compare the processing steps and associated overhead costs of different parsing models, tradeoffs in accessing and modifying parsed data, and XML-based applications' access and modification requirements.Figure 1 also illustrates the three-step parsing process. The first two steps, character conversion and lexical analysis, are usually invariant among different parsing models, while the third step, syntactic analysis, creates data representations based on the parsing model used.To help developers make sensible choices for their target applications, we compared the data representations of four representative parsing models: document object model (DOM; www.w3.org/DOM), simple API for XML (SAX; www.saxproject.org), streaming API for XML (StAX; http://jcp.org/ en/jsr/detail?id=173), and virtual token descriptor (VTD; http://vtd-xml. sourceforge.net). These data representations result in different operational and performance characteristics.XML-based database and networking applications have unique requirements with respect to access and modification of parsed data. Database
Stream-based simultaneous XPath processing plays a critical role in service oriented networking, where the processing must scale well in terms of concurrent input streams and number of XPath queries. However, there are no benchmarks or evaluation methodology in existing literatures that benchmark stream-based XPath engines supporting simultaneous queries. In this paper, we describe a novel benchmarking methodology for evaluating XPath engines which handle simultaneous queries on streaming traffics. With structured data model, query model, and control model in our benchmark, we conduct well controlled experiments to assess and isolate various performance factors. We also demonstrate that our structured, quantified approach with wide data set coverage, enables accurate performance measurements, and easy bottleneck isolations of a real-world XPath engine implementation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.