This paper presents the first study on scheduling for cooperative data dissemination in a hybrid infrastructure-to-vehicle (I2V) and vehicle-to-vehicle (V2V) communication environment. We formulate the novel problem of cooperative data scheduling (CDS). Each vehicle informs the road-side unit (RSU) the list of its current neighboring vehicles and the identifiers of the retrieved and newly requested data. The RSU then selects sender and receiver vehicles and corresponding data for V2V communication, while it simultaneously broadcasts a data item to vehicles that are instructed to tune into the I2V channel. The goal is to maximize the number of vehicles that retrieve their requested data. We prove that CDS is NP-hard by constructing a polynomial-time reduction from the Maximum Weighted Independent Set (MWIS) problem.
Scheduling decisions are made by transforming CDS to MWIS and using a greedy method to approximately solve MWIS. We build a simulation model based on realistic traffic and communication characteristics and demonstrate the superiority and scalability of the proposed solution. The proposed model and solution, which are based on the centralized scheduler at the RSU, represent the first known vehicular ad hoc network (VANET) implementation of software defined network (SDN) concept.Index Terms-Cooperative data dissemination, scheduling, software defined network, vehicular ad hoc networks.
Geoscience observations and model simulations are generating vast amounts of multi-dimensional data. Effectively analyzing these data are essential for geoscience studies. However, the tasks are challenging for geoscientists because processing the massive amount of data is both computing and data intensive in that data analytics requires complex procedures and multiple tools. To tackle these challenges, a scientific workflow framework is proposed for big geoscience data analytics. In this framework techniques are proposed by leveraging cloud computing, MapReduce, and Service Oriented Architecture (SOA). Specifically, HBase is adopted for storing and managing big geoscience data across distributed computers. MapReduce-based algorithm framework is developed to support parallel processing of geoscience data. And service-oriented workflow architecture is built for supporting on-demand complex data analytics in the cloud environment. A proof-of-concept prototype tests the performance of the framework. Results show that this innovative framework significantly improves the efficiency of big geoscience data analytics by reducing the data processing time as well as simplifying data analytical procedures for geoscientists.
Efficient processing of big geospatial data is crucial for tackling global and regional challenges such as climate change and natural disasters, but it is challenging not only due to the massive data volume but also due to the intrinsic complexity and high dimensions of the geospatial datasets. While traditional computing infrastructure does not scale well with the rapidly increasing data volume, Hadoop has attracted increasing attention in geoscience communities for handling big geospatial data. Recently, many studies were carried out to investigate adopting Hadoop for processing big geospatial data, but how to adjust the computing resources to efficiently handle the dynamic geoprocessing workload was barely explored. To bridge this gap, we propose a novel framework to automatically scale the Hadoop cluster in the cloud environment to allocate the right amount of computing resources based on the dynamic geoprocessing workload. The framework and auto-scaling algorithms are introduced, and a prototype system was developed to demonstrate the feasibility and efficiency of the proposed scaling mechanism using Digital Elevation Model (DEM) interpolation as an example. Experimental results show that this auto-scaling framework could (1) significantly reduce the computing resource utilization (by 80% in our example) while delivering similar performance as a full-powered cluster; and (2) effectively handle the spike processing workload by automatically increasing the computing resources to ensure the processing is finished within an acceptable time. Such an auto-scaling approach provides a valuable reference to optimize the performance of geospatial applications to address data-and computational-intensity challenges in GIScience in a more cost-efficient manner.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.