Convolutional neural nets (CNNs) have become a practical means to perform vision tasks, particularly in the area of image classification. FPGAs are well known to be able to perform convolutions efficiently, however, most recent efforts to run CNNs on FPGAs have shown limited advantages over other devices such as GPUs. Previous approaches on FPGAs have often been memory bound due to the limited external memory bandwidth on the FPGA device. We show a novel architecture written in OpenCL TM , which we refer to as a Deep Learning Accelerator (DLA), that maximizes data reuse and minimizes external memory bandwidth. Furthermore, we show how we can use the Winograd transform to significantly boost the performance of the FPGA. As a result, when running our DLA on Intel's Arria 10 device we can achieve a performance of 1020img/s, or 23img/s/W when running the AlexNet CNN benchmark. This comes to 1382 GFLOPs and is 10x faster with 8.4x more GFLOPS and 5.8x better efficiency than the state-of-the-art on FP-GAs. Additionally, 23 img/s/W is competitive against the best publicly known implementation of AlexNet on nVidia's TitanX GPU. Keywords Deep Neural Network, Convolution Neural NetworkDue to the contributions above we are able to implement all layers of AlexNet [7] on Intel's Arria 10 FPGA and achieve over 10x better throughput and 8.4x more GFLOPS than the state-of-the-art FPGA implementation of AlexNet [20]. Furthermore, we show that, to the best of our knowledge, this is the first FPGA implementation whose performance per watt is competitive against the same generation highlyoptimized TitanX GPU results [3,9,10].The rest of the paper is organized as follows. Section 2 has background on CNNs and related work. Section 3 describes the DLA architecture. Section 4 describes our analytical model for design space exploration. Finally, Sections 5 and 6 describe our results. BACKGROUNDDeep neural networks are machine learning algorithms that are inspired by the structure and function of the human brain. They consist of several interconnected artificial neurons that are modeled after the neurons of the human nervous system. An artificial neuron accepts numerical input from other neurons, and produces an output. For DNNs, the output is computed as a dot-product of its inputs and its
Overlays have shown significant promise for fieldprogrammable gate-arrays (FPGAs) as they allow for fast development cycles and remove many of the challenges of the traditional FPGA hardware design flow. However, this often comes with a significant performance burden resulting in very little adoption of overlays for practical applications. In this paper, we tailor an overlay to a specific application domain, and we show how we maintain its full programmability without paying for the performance overhead traditionally associated with overlays. Specifically, we introduce an overlay targeted for deep neural network inference with only~1% overhead to support the control and reprogramming logic using a lightweight very-long instruction word (VLIW) network. Additionally, we implement a sophisticated domain specific graph compiler that compiles deep learning languages such as Caffe or Tensorflow to easily target our overlay. We show how our graph compiler performs architecture-driven software optimizations to significantly boost performance of both convolutional and recurrent neural networks (CNNs/RNNs) -we demonstrate a 3× improvement on ResNet-101 and a 12× improvement for long short-term memory (LSTM) cells, compared to naïve implementations. Finally, we describe how we can tailor our hardware overlay, and use our graph compiler to achieve~900 fps on GoogLeNet on an Intel Arria 10 1150 -the fastest ever reported on comparable FPGAs.
Advanced diagnostics are enabling cancer treatments to become increasingly tailored to the individual through developments in immunotherapies and targeted therapies. However, long turnaround times and high costs of molecular testing hinder the widespread implementation of targeted cancer treatments. Meanwhile, gold-standard histopathological assessment carried out by a trained pathologist is widely regarded as routine and mandatory in most cancers. Recently, methods have been developed to mine hidden information from histopathological slides using deep learning applied to scanned and digitized slides; deep learning comprises a collection of computational methods which learn patterns in data in order to make predictions. Such methods have been reported to be successful in a variety of cancers for predicting the presence of biomarkers such as driver mutations, tumour mutational burden, and microsatellite instability. This information could prove valuable to pathologists and oncologists in clinical decision making for cancer treatment and triage for in-depth sequencing. In addition to identifying molecular features, deep learning has been applied to predict prognosis and treatment response in certain cancers. Despite reported successes, many challenges remain before the clinical implementation of such diagnostic strategies in the clinical setting is possible. This review aims to outline recent developments in the field of deep learning for predicting molecular genetics from histopathological slides, as well as to highlight limitations and pitfalls of working with histopathology slides in deep learning.
Background and AimsBesides the crucial role in the treatment of acute ischemic stroke (AIS), mechanical thrombectomy represents a unique opportunity for researchers to study the retrieved clots, with the possibility of unveiling biological patterns linked to stroke pathophysiology and etiology. We aimed to develop a shotgun proteomic approach to study and compare the proteome of formalin-fixed paraffin-embedded (FFPE) cardioembolic and large artery atherosclerotic (LAA) clots.MethodsWe used 16 cardioembolic and 15 LAA FFPE thrombi from 31 AIS patients. The thrombus proteome was analyzed by label-free quantitative liquid chromatography-tandem mass spectrometry (LC-MS/MS). MaxQuant v1.5.2.8 and Perseus v.1.6.15.0 were used for bioinformatics analysis. Protein classes were identified using the PANTHER database and the STRING database was used to predict protein interactions.ResultsWe identified 1,581 protein groups as part of the AIS thrombus proteome. Fourteen significantly differentially abundant proteins across the two etiologies were identified. Four proteins involved in the ubiquitin-proteasome pathway, blood coagulation or plasminogen activating cascade were identified as significantly abundant in LAA clots. Ten proteins involved in the ubiquitin proteasome-pathway, cytoskeletal remodeling of platelets, platelet adhesion or blood coagulation were identified as significantly abundant in cardioembolic clots.ConclusionOur results outlined a set of 14 proteins for a proof-of-principle characterization of cardioembolic and LAA FFPE clots, advancing the proteome profile of AIS human thrombi and understanding the pathophysiology of ischemic stroke.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.