Threading through Macrocycles Enhances the Performance of Carbon Nanotubes as Polymer Fillers

Abstract-The constant growth of data and its importance to drive Machine Learning and Big Data is pushing storage systems towards ever increasing I/O bandwidth and lower latency requirements. In recent years, the Non Volatile Memory Express (NVMe) standard has enabled SSD drives to deliver high I/O rates by allowing the storage to be connected directly via the fastest available interconnect to the processing chip. In parallel, the adoption of FPGAs in data centers is creating opportunities to accelerate various applications and/or Operating System (OS) operations. While, FPGAs in data centers have been connected via PCIe to mostly x86 servers, we have now also available heterogeneous SoCs with multi-cores and FPGAs integrated on the same die and connected by an onchip interconnect.In this paper, we present how to rethink and accelerate NVMe performance on heterogeneous SoC with integrated FPGAs providing a first research insight on the performance benefits of such an approach. We provide an analysis of the Linux block I/O layer and showcase the relationship between the system's performance and its I/O implementation. Consequently, we introduce an FPGA-based fast path which accelerates the access to the NVMe drive. Our comparative evaluation demonstrates that our FPGA-based FastPath achieves up to 71% lower latency and up to 5x higher I/O performance against the baseline system on a Zynq development board.

show abstract

Transparent Compiler and Runtime Specializations for Accelerating Managed Languages on FPGAs

Papadimitriou¹,

Fumero²,

Stratikopoulos³

et al. 2020

Programming

View full text Add to dashboard Cite

In recent years, heterogeneous computing has emerged as the vital way to increase computers' performance and energy efficiency by combining diverse hardware devices, such as Graphics Processing Units (GPUs) and Field Programmable Gate Arrays (FPGAs). The rationale behind this trend is that different parts of an application can be offloaded from the main CPU to diverse devices, which can efficiently execute these parts as co-processors. FPGAs are a subset of the most widely used co-processors, typically used for accelerating specific workloads due to their flexible hardware and energy-efficient characteristics. These characteristics have made them prevalent in a broad spectrum of computing systems ranging from low-power embedded systems to high-end data centers and cloud infrastructures.However, these hardware characteristics come at the cost of programmability. Developers who create their applications using high-level programming languages (e.g., Java, Python, etc.) are required to familiarize with a hardware description language (e.g., VHDL, Verilog) or recently heterogeneous programming models (e.g., OpenCL, HLS) in order to exploit the co-processors' capacity and tune the performance of their applications. Currently, the above-mentioned heterogeneous programming models support exclusively the compilation from compiled languages, such as C and C++. Thus, the transparent integration of heterogeneous co-processors to the software ecosystem of managed programming languages (e.g. Java, Python) is not seamless.In this paper we rethink the engineering trade-offs that we encountered, in terms of transparency and compilation overheads, while integrating FPGAs into high-level managed programming languages. We present a novel approach that enables runtime code specialization techniques for seamless and high-performance execution of Java programs on FPGAs. The proposed solution is prototyped in the context of the Java programming language and TornadoVM; an open-source programming framework for Java execution on heterogeneous hardware. Finally, we evaluate the proposed solution for FPGA execution against both sequential and multithreaded Java implementations showcasing up to 224× and 19.8× performance speedups, respectively, and up to 13.82× compared to TornadoVM running on an Intel integrated GPU. We also provide a break-down analysis of the proposed compiler optimizations for FPGA execution, as a means to project their impact on the applications' characteristics. ACM CCS 2012Software and its engineering → Runtime environments;

show abstract

HPC-gSpan: An FPGA-based parallel system for frequent subgraph mining

Stratikopoulos

Chrysos

Papaefstathiou

et al. 2014

View full text Add to dashboard Cite

Graph mining is an important research area within the domain of data mining. One of the most challenging tasks of graph mining is frequent subgraph mining. This work presents the first FPGA-based implementation, to the best of our knowledge, of the most efficient and well-known algorithm for the Frequent Subgraph Mining (FSM) problem, i.e. gSpan. The proposed system, named High Performance Computing-gSpan (HPC-gSpan), achieves manyfold speedup vs. the official software solution of the gboost library when executed on a high-end CPU for various real-world datasets.

show abstract

Towards Prototyping and Acceleration of Java Programs onto Intel FPGAs

Papadimitriou

Fumero

Stratikopoulos

et al. 2019

View full text Add to dashboard Cite

FastPath: Towards Wire-Speed NVMe SSDs

Stratikopoulos

Kotselidis

Goodacre

et al. 2018

View full text Add to dashboard Cite

The constant growth of data and its importance to drive Machine Learning and Big Data is pushing storage systems towards ever increasing I/O bandwidth and lower latency requirements. In recent years, the Non Volatile Memory Express (NVMe) standard has enabled SSD drives to deliver high I/O rates by allowing the storage to be connected directly via the fastest available interconnect to the processing chip. In parallel, the adoption of FPGAs in data centers is creating opportunities to accelerate various applications and/or Operating System (OS) operations. While, FPGAs in data centers have been connected via PCIe to mostly x86 servers, we have now also available heterogeneous SoCs with multi-cores and FPGAs integrated on the same die and connected by an onchip interconnect. In this paper, we present how to rethink and accelerate NVMe performance on heterogeneous SoC with integrated FPGAs providing a first research insight on the performance benefits of such an approach. We provide an analysis of the Linux block I/O layer and showcase the relationship between the system's performance and its I/O implementation. Consequently, we introduce an FPGA-based fast path which accelerates the access to the NVMe drive. Our comparative evaluation demonstrates that our FPGA-based FastPath achieves up to 71% lower latency and up to 5x higher I/O performance against the baseline system on a Zynq development board.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Athanasios Stratikopoulos

FastPath: Towards Wire-speed NVMe SSDs

Transparent Compiler and Runtime Specializations for Accelerating Managed Languages on FPGAs

HPC-gSpan: An FPGA-based parallel system for frequent subgraph mining

Towards Prototyping and Acceleration of Java Programs onto Intel FPGAs

FastPath: Towards Wire-Speed NVMe SSDs

Contact Info

Product

Resources

About