Serverless Data Analytics with Flint

Kim, Youngbin; Lin, Jimmy

doi:10.1109/cloud.2018.00063

Cited by 53 publications

(26 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…CFs? time TR-Spark [46] Yes No No n/a Apache Flink [8] Yes No Yes Yes Burscale [7] Yes No Yes Yes Qubole [36] No Yes No No Flint [26] No Yes No No ExCamera [20] No Yes n/a n/a numpywren [38] No Yes No No PyWren [24] No Yes No No Locus (PyWren+Redis) [35] No Yes Yes No Cirrus [25] No Yes Yes No gg [19] No Yes Yes No FEAT [32], MArk [49] Yes Yes n/a n/a SplitServe Yes Yes Yes Yes Table 1. A comparison of SplitServe against the state-of-theart platforms exploiting VMs and Cloud Functions (CFs).…”

Section: Related Workmentioning

confidence: 99%

“…Redis, being an inmemory dictionary, significantly improves on I/O operations compared to disk writes, but is quite expensive as it requires the use of large VMs. Flint [26], another prototype of Spark on AWS Lambda, replaces AWS S3 with SQS [2] for intermediate data I/O using multiple distributed queues, which is a better fit for a high number of small writes. SQS does better in terms of throughput but is costlier and less reliable compared to AWS S3.…”

Section: Related Workmentioning

confidence: 99%

“…Also, a user cannot easily control the order in which interacting Lambdas are actually started by the provider. Control could be achieved either by an orchestrator on a VM, which would introduce delays, or by "step functions" that may increase costs [5,17] [26], which have been reported to be slow 6 ; or (b) in-memory datastores such Redis [35], which are relatively expensive. • Steeper cost curve for longer-lasting work: Currently, Lambdas charge a higher price per unit resource as compared to VMs which results in higher costs for long-lasting resource needs, as discussed above (see Figure 1).…”

Section: Background and Motivationmentioning

confidence: 99%

See 2 more Smart Citations

SplitServe

Jain

Baarzi

Kesidis

et al. 2020

Proceedings of the 21st International Middleware Conference

View full text Add to dashboard Cite

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Background and Motivationmentioning

confidence: 99%

See 1 more Smart Citation

SplitServe

Jain

Baarzi

Kesidis

et al. 2020

Proceedings of the 21st International Middleware Conference

View full text Add to dashboard Cite

“…These solutions can be classified into two types: (I) functions to orchestrate functions; and (II) external client schedulers. In the first category (e.g., [2], [3]), the orchestration is performed inside a serverless function. However, this approach suffers double billing according to the trilemma: The orchestrator function is billed while waiting for the execution of the orchestrated functions to complete (which are also billed).…”

Section: Related Workmentioning

confidence: 99%

Comparison of FaaS Orchestration Systems

López

Sánchez-Artigas

Parı́s

et al. 2018

2018 IEEE/ACM International Conference on Utility and Cloud Computing Companion (UCC Companion)

View full text Add to dashboard Cite

Since the appearance of Amazon Lambda in 2014, all major cloud providers have embraced the "Function as a Service" (FaaS) model, because of its enormous potential for a wide variety of applications. As expected (and also desired), the competition is fierce in the serverless world, and includes aspects such as the run-time support for the orchestration of serverless functions. In this regard, the three major production services are currently Amazon Step Functions (December 2016), Azure Durable Functions (June 2017), and IBM Composer (October 2017), still young and experimental projects with a long way ahead. In this article, we will compare and analyze these three serverless orchestration systems under a common evaluation framework. We will study their architectures, programming and billing models, and their effective support for parallel execution, among others. Through a series of experiments, we will also evaluate the run-time overhead of the different infrastructures for different types of workflows.

show abstract

“…In serverless computing, also referred to as Functions-asa-Service (FaaS), application developers provide an eventdriven function to cloud providers, and the cloud provider is responsible for seamlessly scaling function invocations to meet demands as event triggers occur. Serverless is powerful and expressive, with applications designed for video processing [29,41], HPC and scientic computing [36,51,89,93], machine learning [35,39,50], data analytics [44,55], chatbots [103], backends [31,67], IoT [69,102], and even general applications [40,92]. Indeed, a recent study of a production serverless oering indicates applications range from single functions to hundreds of functions in size, with function execution times ranging from less than a second to the order of minutes [88].…”

Section: Introductionmentioning

confidence: 99%

Sequoia

Tariq

Pahl

Nimmagadda

et al. 2020

Proceedings of the 11th ACM Symposium on Cloud Computing

View full text Add to dashboard Cite

Serverless computing is a rapidly growing paradigm that easily harnesses the power of the cloud. With serverless computing, developers simply provide an event-driven function to cloud providers, and the provider seamlessly scales function invocations to meet demands as event-triggers occur. As current and future serverless oerings support a wide variety of serverless applications, eective techniques to manage serverless workloads becomes an important issue. This work examines current management and scheduling practices in cloud providers, uncovering many issues including inated application run times, function drops, inecient allocations, and other undocumented and unexpected behavior. To x these issues, a new quality-of-service function scheduling and allocation framework, called Sequoia, is designed. Sequoia allows developers or administrators to easily dene how serverless functions and applications should be deployed, capped, prioritized, or altered based on easily congured, exible policies. Results with controlled and realistic workloads show Sequoia seamlessly adapts to policies, eliminates mid-chain drops, reduces queuing times by up to 6.4⇥, enforces tight chain-level fairness, and improves run-time performance up to 25⇥.

show abstract

Serverless Data Analytics with Flint

Cited by 53 publications

References 5 publications

SplitServe

SplitServe

Comparison of FaaS Orchestration Systems

Sequoia

Contact Info

Product

Resources

About