Shane O’Connell scite author profile

Convolutional neural nets (CNNs) have become a practical means to perform vision tasks, particularly in the area of image classification. FPGAs are well known to be able to perform convolutions efficiently, however, most recent efforts to run CNNs on FPGAs have shown limited advantages over other devices such as GPUs. Previous approaches on FPGAs have often been memory bound due to the limited external memory bandwidth on the FPGA device. We show a novel architecture written in OpenCL TM , which we refer to as a Deep Learning Accelerator (DLA), that maximizes data reuse and minimizes external memory bandwidth. Furthermore, we show how we can use the Winograd transform to significantly boost the performance of the FPGA. As a result, when running our DLA on Intel's Arria 10 device we can achieve a performance of 1020img/s, or 23img/s/W when running the AlexNet CNN benchmark. This comes to 1382 GFLOPs and is 10x faster with 8.4x more GFLOPS and 5.8x better efficiency than the state-of-the-art on FP-GAs. Additionally, 23 img/s/W is competitive against the best publicly known implementation of AlexNet on nVidia's TitanX GPU. Keywords Deep Neural Network, Convolution Neural NetworkDue to the contributions above we are able to implement all layers of AlexNet [7] on Intel's Arria 10 FPGA and achieve over 10x better throughput and 8.4x more GFLOPS than the state-of-the-art FPGA implementation of AlexNet [20]. Furthermore, we show that, to the best of our knowledge, this is the first FPGA implementation whose performance per watt is competitive against the same generation highlyoptimized TitanX GPU results [3,9,10].The rest of the paper is organized as follows. Section 2 has background on CNNs and related work. Section 3 describes the DLA architecture. Section 4 describes our analytical model for design space exploration. Finally, Sections 5 and 6 describe our results. BACKGROUNDDeep neural networks are machine learning algorithms that are inspired by the structure and function of the human brain. They consist of several interconnected artificial neurons that are modeled after the neurons of the human nervous system. An artificial neuron accepts numerical input from other neurons, and produces an output. For DNNs, the output is computed as a dot-product of its inputs and its

show abstract

DLA: Compiler and FPGA Overlay for Neural Network Inference Acceleration

Abdelfattah¹,

Han²,

Bitar³

et al. 2018

View full text Add to dashboard Cite

Overlays have shown significant promise for fieldprogrammable gate-arrays (FPGAs) as they allow for fast development cycles and remove many of the challenges of the traditional FPGA hardware design flow. However, this often comes with a significant performance burden resulting in very little adoption of overlays for practical applications. In this paper, we tailor an overlay to a specific application domain, and we show how we maintain its full programmability without paying for the performance overhead traditionally associated with overlays. Specifically, we introduce an overlay targeted for deep neural network inference with only~1% overhead to support the control and reprogramming logic using a lightweight very-long instruction word (VLIW) network. Additionally, we implement a sophisticated domain specific graph compiler that compiles deep learning languages such as Caffe or Tensorflow to easily target our overlay. We show how our graph compiler performs architecture-driven software optimizations to significantly boost performance of both convolutional and recurrent neural networks (CNNs/RNNs) -we demonstrate a 3× improvement on ResNet-101 and a 12× improvement for long short-term memory (LSTM) cells, compared to naïve implementations. Finally, we describe how we can tailor our hardware overlay, and use our graph compiler to achieve~900 fps on GoogLeNet on an Intel Arria 10 1150 -the fastest ever reported on comparable FPGAs.

show abstract

Deep Learning of Histopathological Features for the Prediction of Tumour Molecular Genetics

et al. 2021

View full text Add to dashboard Cite

Advanced diagnostics are enabling cancer treatments to become increasingly tailored to the individual through developments in immunotherapies and targeted therapies. However, long turnaround times and high costs of molecular testing hinder the widespread implementation of targeted cancer treatments. Meanwhile, gold-standard histopathological assessment carried out by a trained pathologist is widely regarded as routine and mandatory in most cancers. Recently, methods have been developed to mine hidden information from histopathological slides using deep learning applied to scanned and digitized slides; deep learning comprises a collection of computational methods which learn patterns in data in order to make predictions. Such methods have been reported to be successful in a variety of cancers for predicting the presence of biomarkers such as driver mutations, tumour mutational burden, and microsatellite instability. This information could prove valuable to pathologists and oncologists in clinical decision making for cancer treatment and triage for in-depth sequencing. In addition to identifying molecular features, deep learning has been applied to predict prognosis and treatment response in certain cancers. Despite reported successes, many challenges remain before the clinical implementation of such diagnostic strategies in the clinical setting is possible. This review aims to outline recent developments in the field of deep learning for predicting molecular genetics from histopathological slides, as well as to highlight limitations and pitfalls of working with histopathology slides in deep learning.

show abstract

Potential Biomarkers of Acute Ischemic Stroke Etiology Revealed by Mass Spectrometry-Based Proteomic Characterization of Formalin-Fixed Paraffin-Embedded Blood Clots

et al. 2022

View full text Add to dashboard Cite

Background and AimsBesides the crucial role in the treatment of acute ischemic stroke (AIS), mechanical thrombectomy represents a unique opportunity for researchers to study the retrieved clots, with the possibility of unveiling biological patterns linked to stroke pathophysiology and etiology. We aimed to develop a shotgun proteomic approach to study and compare the proteome of formalin-fixed paraffin-embedded (FFPE) cardioembolic and large artery atherosclerotic (LAA) clots.MethodsWe used 16 cardioembolic and 15 LAA FFPE thrombi from 31 AIS patients. The thrombus proteome was analyzed by label-free quantitative liquid chromatography-tandem mass spectrometry (LC-MS/MS). MaxQuant v1.5.2.8 and Perseus v.1.6.15.0 were used for bioinformatics analysis. Protein classes were identified using the PANTHER database and the STRING database was used to predict protein interactions.ResultsWe identified 1,581 protein groups as part of the AIS thrombus proteome. Fourteen significantly differentially abundant proteins across the two etiologies were identified. Four proteins involved in the ubiquitin-proteasome pathway, blood coagulation or plasminogen activating cascade were identified as significantly abundant in LAA clots. Ten proteins involved in the ubiquitin proteasome-pathway, cytoskeletal remodeling of platelets, platelet adhesion or blood coagulation were identified as significantly abundant in cardioembolic clots.ConclusionOur results outlined a set of 14 proteins for a proof-of-principle characterization of cardioembolic and LAA FFPE clots, advancing the proteome profile of AIS human thrombi and understanding the pathophysiology of ischemic stroke.

show abstract

Creating High Performance Applications with Intel's FPGA OpenCL™ SDK

Ling¹,

Aydonat²,

O’Connell³

et al. 2017

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Shane O’Connell

An OpenCL™ Deep Learning Accelerator on Arria 10

DLA: Compiler and FPGA Overlay for Neural Network Inference Acceleration

Deep Learning of Histopathological Features for the Prediction of Tumour Molecular Genetics

Potential Biomarkers of Acute Ischemic Stroke Etiology Revealed by Mass Spectrometry-Based Proteomic Characterization of Formalin-Fixed Paraffin-Embedded Blood Clots

Creating High Performance Applications with Intel's FPGA OpenCL™ SDK

Contact Info

Product

Resources

About