Fabio Checconi scite author profile

Packet scheduling, together with classification, is one of the most expensive processing steps in systems providing tight bandwidth and delay guarantees at high packet rates. Schedulers with near-optimal service guarantees and ${ O}({1})$ time complexity have been proposed in the past, using techniques such as timestamp rounding and flow grouping to keep their execution time small. However, even the two best proposals in this family have a per-packet cost component that is linear either in the number of groups or in the length of the packet being transmitted. Furthermore, no studies are available on the actual execution time of these algorithms. In this paper we make two contributions. First, we present Quick Fair Queueing (QFQ), a new ${ O}({ 1})$ scheduler that provides near-optimal guarantees and is the first to achieve that goal with a truly constant cost also with respect to the number of groups and the packet length. The QFQ algorithm has no loops and uses very simple instructions and data structures that contribute to its speed of operation. Second, we have developed production-quality implementations of QFQ and of its closest competitors, which we use to present a detailed comparative performance analysis of the various algorithms. Experiments show that QFQ fulfills our expectations, outperforming the other algorithms in the same class. In absolute terms, even on a low-end workstation, QFQ takes about 110 ns for an enqueue()/dequeue() pair (only twice the time of DRR, but with much better service guarantees)

show abstract

Optimizing Sparse Matrix-Vector Multiplication for Large-Scale Data Analytics

Buono

Petrini

Checconi

et al. 2016

View full text Add to dashboard Cite

Sparse Matrix-Vector multiplication (SpMV) is a fundamental kernel, used by a large class of numerical algorithms. Emerging big-data and machine learning applications are propelling a renewed interest in SpMV algorithms that can tackle massive amount of unstructured data-rapidly approaching the TeraByte range-with predictable, high performance. In this paper we describe a new methodology to design SpMV algorithms for shared memory multiprocessors (SMPs) that organizes the original SpMV algorithm into two distinct phases. In the first phase we build a scaled matrix, that is reduced in the second phase, providing numerous opportunities to exploit memory locality. Using this methodology, we have designed two algorithms. Our experiments on irregular big-data matrices (an order of magnitude larger than the current state of the art) show a quasi-optimal scaling on a large-scale POWER8 SMP system, with an average performance speedup of 3.8×, when compared to an equally optimized version of the CSR algorithm. In terms of absolute performance, with our implementation, the POWER8 SMP system is comparable to a 256-node cluster. In terms of size, it can process matrices with up to 68 billion edges, an order of magnitude larger than state-of-the-art clusters. CCS Concepts•Computing methodologies → Linear algebra algorithms; Shared memory algorithms; Vector / streaming algorithms; •Mathematics of computing → Graph algorithms; •Theory of computation → Graph algorithms analysis; Data structures design and anal- * Fabrizio Petrini has since changed his affiliation. His current contact is fabrizio.petrini@intel.com ACM acknowledges that this contribution was authored or co-authored by an employee, or contractor of the national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only. Permission to make digital or hard copies for personal or classroom use is granted. Copies must bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. To copy otherwise, distribute, republish, or post, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org.

show abstract

Scalable Single Source Shortest Path Algorithms for Massively Parallel Systems

Chakaravarthy

Checconi

Petrini

et al. 2014

View full text Add to dashboard Cite

High Throughput Disk Scheduling with Fair Bandwidth Distribution

Valente

Checconi

2010

IEEE Trans. Comput.

View full text Add to dashboard Cite

Abstract-Mainstream applications-such as file copy/transfer, Web, DBMS, or video streaming-typically issue synchronous disk requests. As shown in this paper, this fact may cause workconserving schedulers to fail both to enforce guarantees and to provide a high disk throughput. A high throughput can be however recovered by just idling the disk for a short time interval after the completion of each request. In contrast, guarantees may still be violated by existing timestamp-based schedulers, because of the rules they use to tag requests.Budget Fair Queueing (BFQ), the new disk scheduler presented in this paper, is an example of how disk idling, combined with proper back-shifting of request timestamps, may allow a timestamp-based disk scheduler to preserve both guarantees and a high throughput. Under BFQ each application is always guaranteed-over any time interval and independently of whether it issues synchronous requests-a bounded lag with respect to its reserved fraction of the total number of bytes transferred by the disk device.We show the single-disk performance of our implementation of BFQ in the Linux kernel through experiments with real and emulated mainstream applications.Index Terms-Scheduling, secondary storage, quality of service.

show abstract

Traversing Trillions of Edges in Real Time: Graph Exploration on Large-Scale Parallel Machines

Checconi

Petrini

2014

View full text Add to dashboard Cite

Breaking the speed and scalability Barriers for Graph exploration on distributed-memory machines

Checconi

Petrini

Willcock

et al. 2012

View full text Add to dashboard Cite

Real-Time Issues in Live Migration of Virtual Machines

Checconi

Cucinotta

Stein

2010

View full text Add to dashboard Cite

Abstract. This paper addresses the issue of how to meet the strict timing constraints of (soft) real-time virtualized applications while the Virtual Machine (VM) hosting them is undergoing a live migration. To this purpose, it is essential that the resource requirements of a migration are identified in advance, that appropriate resources are reserved to the process, and that multiple VMs sharing the same resources are temporally isolated from each other. The first issue is dealt with by introducing a stochastic model for the migration process. The other ones by introducing a methodology making use of proper scheduling algorithms (for both CPU and network) that allow for reserving resource shares to individual VMs. Also, an extensive set of simulations have been done by using traces of a VLC video server virtualized by using KVM on Linux. The traces have been obtained by patching KVM at the kernel level, and the same patch constitutes an important step towards the complete implementation of the proposed technique. The obtained results highlight the benefits of the proposed approach.

show abstract

12 3 4 5 6

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Fabio Checconi

Scalable Community Detection with the Louvain Algorithm

QFQ: Efficient Packet Scheduling With Tight Guarantees

Optimizing Sparse Matrix-Vector Multiplication for Large-Scale Data Analytics

Scalable Single Source Shortest Path Algorithms for Massively Parallel Systems

High Throughput Disk Scheduling with Fair Bandwidth Distribution

Traversing Trillions of Edges in Real Time: Graph Exploration on Large-Scale Parallel Machines

Breaking the speed and scalability Barriers for Graph exploration on distributed-memory machines

Real-Time Issues in Live Migration of Virtual Machines

Contact Info

Product

Resources

About