Abstract-In this paper we present a framework to enable data-intensive Spark workloads on MareNostrum, a petascale supercomputer designed mainly for compute-intensive applications. As far as we know, this is the first attempt to investigate optimized deployment configurations of Spark on a petascale HPC setup. We detail the design of the framework and present some benchmark data to provide insights into the scalability of the system. We examine the impact of different configurations including parallelism, storage and networking alternatives, and we discuss several aspects in executing Big Data workloads on a computing system that is based on the compute-centric paradigm. Further, we derive conclusions aiming to pave the way towards systematic and optimized methodologies for finetuning data-intensive application on large clusters emphasizing on parallelism configurations.
Abstract-Spark has become one of the main options for large-scale analytics running on top of shared-nothing clusters. This work aims to make a deep dive into the parallelism configuration and shed light on the behavior of parallel spark jobs. It is motivated by the fact that running a Spark application on all the available processors does not necessarily imply lower running time, while may entail waste of resources. We first propose analytical models for expressing the running time as a function of the number of machines employed. We then take another step, namely to present novel algorithms for configuring dynamic partitioning with a view to minimizing resource consumption without sacrificing running time beyond a user-defined limit. The problem we target is NP-hard. To tackle it, we propose a greedy approach after introducing the notions of dependency graphs and of the benefit from modifying the degree of partitioning at a stage; complementarily, we investigate a randomized approach. Our polynomial solutions are capable of judiciously use the resources that are potentially at user's disposal and strike interesting trade-offs between running time and resource consumption. Their efficiency is thoroughly investigated through experiments based on real execution data.
Reliable earthquake detection algorithms are necessary to properly analyze and catalog the continuously growing seismic records. We report the results of applying a deep convolutional neural network, called UPC-UCV (Universitat Politecnica de Catalunya - Universidad Central de Venezuela), over single-station three-channel signal windows for P-wave earthquake detection and source region estimation in north-central Venezuela. The analysis is performed on a new dataset of handpicked arrivals of P waves from local events, named CARABOBO, built and made public for reproducibility and benchmarking purposes. The CARABOBO dataset consists of three-channel continuous data recorded by the broadband stations of the Venezuelan Foundation for Seismological Research in the region of 9.5°–11.5°N and 67.0°–69.0°W during the time period from April 2018 to April 2019. During this period, 949 earthquakes were recorded in that area, corresponding to earthquakes with magnitudes in the range from Mw 1.1 to 5.2. To estimate the epicentral source region of a detected event, the proposed network employs geographical distribution of the CARABOBO dataset into K clusters as a basis. This geographical partitioning is automatically performed by the k-means algorithm, and the optimality of the K-values for our dataset has been assessed using the elbow (K=5) and silhouette (K=3) methods. For target seismicity, the proposed network achieves 95.27% detection accuracy and 93.36% source region estimation accuracy, when using K=5 geographic clusters. The location accuracy slightly increases to 95.68% in the case of K=3 geographic partitions. The detection capability of this network has also been tested on the OKLAHOMA dataset, which compiles more than 2000 local earthquakes that occurred in this U.S. state. Without any modification, the proposed network yields excellent detection results when trained and evaluated on that dataset (98.21% accuracy; ConvNetQuake, fine-tuned for this dataset, achieves a 97.32% accuracy), corresponding to a totally different geographical region.
This paper presents a work consisting in using deep convolutional neural networks (CNNs) to facilitate the curation of brand-related social media images. The final goal is to facilitate searching and discovering user-generated content (UGC) with potential value for digital marketing tasks. The images are captured in real time and automatically annotated with multiple CNNs. Some of the CNNs perform generic object recognition tasks while others perform what we call visual brand identity recognition. When appropriate, we also apply object detection, usually to discover images containing logos. We report experiments with 5 real brands in which more than 1 million real images were analyzed. In order to speed-up the training of custom CNNs we applied a transfer learning strategy. We examine the impact of different configurations and derive conclusions aiming to pave the way towards systematic and optimized methodologies for automatic UGC curation.
Abstract-In this paper, we report a work consisting in using deep convolutional neural networks (CNNs) for curating and filtering photos posted by social media users (Instagram and Twitter). The final goal is to facilitate searching and discovering user-generated content (UGC) with potential value for digital marketing tasks. The images are captured in real time and automatically annotated with multiple CNNs. Some of the CNNs perform generic object recognition tasks while others perform what we call visual brand identity recognition. We report experiments with 5 real brands in which more than 1 million real images were analyzed. In order to speed-up the training of custom CNNs we applied a transfer learning strategy.
As seismic networks continue to spread and monitoring sensors become more efficient, the abundance of data highly surpasses the processing capabilities of earthquake interpretation analysts. Earthquake catalogs are fundamental for fault system studies, event modellings, seismic hazard assessment, forecasting, and ultimately, for mitigating the seismic risk. These have fueled the research for the automation of interpretation tasks such as event detection, event identification, hypocenter location, and source mechanism analysis. Over the last forty years, traditional algorithms based on quantitative analyses of seismic traces in the time or frequency domain, have been developed to assist interpretation. Alternatively, recent advances are related to the application of Artificial Neural Networks (ANNs), a subset of machine learning techniques that is pushing the state-of-the-art forward in many areas. Appropriated trained ANN can mimic the interpretation abilities of best human analysts, avoiding the individual weaknesses of most traditional algorithms, and spending modest computational resources at the operational stage. In this paper, we will survey the latest ANN applications to the automatic interpretation of seismic data, with a special focus on earthquake detection, and the estimation of onset times. For a comparative framework, we give an insight into the labor of human interpreters, who may face uncertainties in the case of small magnitude earthquakes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.