Validity: Until the end of winter semester 2016/17 Instructions SparkSQL framework enables distributed and parallel data processing of various formats using SQL-like query language. The main goal of the master thesis is to use the SparkSQL framework to implement a subset of expressions from the XPath query language, which is used for querying XML data. 1. Get acquainted with the Apache Spark engine, mainly focus on its SparkSQL framework. 2. Study the works related to the process of mapping the XML database technology (XML documents) to the relational database technology. 3. Based on your knowledge, design a query engine that will be able to evaluate XPath queries over XML documents. 4. Implement a prototype of the designed solution using the SparkSQL framework. 5. Perform suitable testing on the implemented prototype, primarily aim on its functional properties. 6. Create a summary of the performed testing and assess the possibility of its deployment in a highly distributed environment. References Will be provided by the supervisor.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.