Self-reported tooth loss and cognitive function: Data from the Hispanic established populations for epidemiologic studies of the elderly (Hispanic EPESE)

Large number of embedded devices, massive volumes of data, users and applications are driving the digital world to move faster than ever. To be competitive in today's digital economy companies have to process large volumes of dynamically changing data at real-time. There are many industries from health-care, e-commerce, insurance and telecommunications with various use cases such as DNA sequencing, capturing customer insights, real-time offers, high-frequency trading, and real-time intrusion detection that have taken the use of Big Data analytics into account to make critical decisions that impact their business [1]. On the other hand, the Internet of Things (IoT) is becoming the primary grounds for data mining and Big Data analytics [2]. With the rapid growth of IoT and its use cases in different domains such as Smart City, Mobile e-Health and Smart Grid, streaming applications are driving a new wave of data revolutions. In most IoT applications the resulting analytics give some feedbacks to the system to improve it [3]. Compared to the other Big Data domains, there is a low-latency cycle between system

show abstract

A Survey of Distributed Stream Processing Systems for Smart City Data Analytics

Nasiri

Nasehi

Goudarzi

2018

View full text Add to dashboard Cite

A Scheduling Algorithm to Maximize Storm Throughput in Heterogeneous Cluster

Nasiri¹,

Nasehi²,

Arman³

et al. 2020

Preprint

View full text Add to dashboard Cite

In the most popular distributed stream processing frameworks (DSPFs), programs are modeled as a directed acyclic graph. This model allows a DSPF to benefit from the parallelism power of distributed clusters. However, choosing the proper number of vertices for each operator and finding an appropriate mapping between these vertices and processing resources have a determinative effect on overall throughput and resource utilization; while the simplicity of current DSPFs' schedulers leads these frameworks to perform poorly on large-scale clusters. In this paper, we present the design and implementation of a heterogeneity-aware scheduling algorithm that finds the proper number of the vertices of an application graph and maps them to the most suitable cluster node. We start to scale up the application graph over a given cluster gradually, by increasing the topology input rate and taking new instances from bottlenecked vertices. Our experimental results on Storm Micro-Benchmark show that 1) the prediction model estimate CPU utilization with 92% accuracy. 2) Compared to default scheduler of Storm, our scheduler provides 7% to 44% throughput enhancement.3) The proposed method can find the solution within 4% (worst case) of the optimal scheduler which obtains the best scheduling scenario using an exhaustive search on problem design space.

show abstract

A scheduling algorithm to maximize storm throughput in heterogeneous cluster

et al. 2023

View full text Add to dashboard Cite

In the most popular distributed stream processing frameworks (DSPFs), programs are modeled as a directed acyclic graph. Using this model, a DSPF can benefit from the parallelism capabilities of distributed clusters. Choosing a reasonable number of vertices for each operator and mapping the vertices to the appropriate processing resources significantly affect the overall system performance. Due to the simplicity of the current DSPF schedulers, these frameworks perform poorly on large-scale clusters. In this paper, we present a heterogeneity-aware scheduling algorithm that finds the proper number of the vertices of an application graph and maps them to the most suitable cluster node. We begin with a pre-processing step which allocates the vertices to the given cluster nodes using profiling data. Then, we gradually increase the topology input rate in order to scale up the application graph. Finally, using a CPU utilization model which predicts the CPU workload based on the input rate to vertices and the processing node’s CPU characteristics, we identify the bottlenecked vertices and allocate new instances derived from them to the least utilized processing resource. Our experimental results on Storm Micro-Benchmark show that (1) the prediction model estimate CPU utilization with 92% accuracy. (2) Compared to the default scheduler of Storm, our scheduler provides 7 to 44% throughput enhancement. (3) The proposed method can find the solution within 4% (worst case) of the optimal scheduler, which obtains the best scheduling scenario using an exhaustive search over problem design space.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

334 Leonard St

Brooklyn, NY 11211

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Saeed Nasehi

Evaluation of distributed stream processing frameworks for IoT applications in Smart Cities

A Survey of Distributed Stream Processing Systems for Smart City Data Analytics

A Scheduling Algorithm to Maximize Storm Throughput in Heterogeneous Cluster

A scheduling algorithm to maximize storm throughput in heterogeneous cluster

Contact Info

Product

Resources

About