The computation and storage requirements for Deep Neural Networks (DNNs) are usually high. This issue limits their deployability on ubiquitous computing devices such as smart phones, wearables and autonomous drones. In this paper, we propose ternary neural networks (TNNs) in order to make deep learning more resource-efficient. We train these TNNs using a teacher-student approach based on a novel, layer-wise greedy methodology. Thanks to our two-stage training procedure, the teacher network is still able to use state-of-the-art methods such as dropout and batch normalization to increase accuracy and reduce training time. Using only ternary weights and activations, the student ternary network learns to mimic the behavior of its teacher network without using any multiplication. Unlike its {-1,1} binary counterparts, a ternary neural network inherently prunes the smaller weights by setting them to zero during training. This makes them sparser and thus more energy-efficient. We design a purpose-built hardware architecture for TNNs and implement it on FPGA and ASIC. We evaluate TNNs on several benchmark datasets and demonstrate up to 3.1× better energy efficiency with respect to the state of the art while also improving accuracy.
International audienceIn this paper, we propose a distributed routing algorithm for vertically partially connected regular 2D topologies of different shapes and sizes (e.g., 2D mesh, torus, ring). The topologies that are the target of this algorithm are of practical interest in the 3D integration of heterogeneous dies using Through-Silicon-Vias (TSVs). Indeed, TSV-based 3D integration allows to envision the stacking of dies with different functions and technologies, using as an interconnect backbone a 3D-NoC. Intrinsically, 3D topologies have better performances, but yield and active area (and thus the cost) are function of the number of TSVs; therefore, the designs tend to use only a subset of available TSVs between two dies. The definition of blockage free and low implementation cost distributed deterministic routing on this kind of topology is thus of theoretical and practical interests. We formally prove that independently of the shape and dimensions of the planar topologies and of the number and placement of the TSVs, the proposed routing algorithm using two virtual channels in the plane is deadlock and livelock free. We also experimentally show that the performance of this algorithm is still acceptable when the number of vertical connections decreases
The Esprit/OMI-COSY project defines transaction-levels to set-up the exchange of IP's in separating function from architecture and body-behavior from proprietary interfaces. These transactionlevels are supported by the "COSY COMMUNICATION IPs" that are presented in this paper. They implement onto Systems-On-Chip the extended Kahn Process Network that is defined in COSY for modeling signal processing applications. We present a generic implementation and performance model of these system-level communications and we illustrate specific implementations. They set system communications across software and hardware boundaries, and achieve bus independence through the Virtual Component Interface of the VSI Alliance. Finally, we describe the COSY-VCC flow that supports communication refinement from specification, to performance estimation, to implementation.
For the design of classic computers the Parallel programming concept is used to abstract HW/SW interfaces during high level specification of application software. The software is then adapted to existing multiprocessor platforms using a low level software layer that implements the programming model. Unlike classic computers, the design of heterogeneous MPSoC includes also building the processors and other kind of hardware components required to execute the software. In this case, the programming model hides both hardware and software refinements. This paper deals with parallel programming models to abstract both hardware and software interfaces in the case of heterogeneous MPSoC design. Different abstraction levels will be needed. For the long term, the use of higher level programming models will open new vistas for optimization and architecture exploration like CPU/RTOS tradeoffs.
International audienceThis paper presents Disydent, a framework dedicated to system-on-a-chip (SoC) platform-based design for shared memory multiple instructions multiple data (MIMD) architectures. We define a platform-based design problem as a triplet (system, application, constraints) where the system is both an operating system (OS) and a hardware (HW) template that can be enhanced with dedicated coprocessors. Our contributions are: 1) the definition of a complete flow for platform-based design, from application to integration including all necessary intermediate steps and 2) a set of tightly bound operational tools to implement the flow. Disydent is based on four tools. The distributed process network (DPN) is a C library for describing Kahn process network (KPN)-based applications. The asynchronous serial interface mode register 0 (ASIM0) is a multiprocessor target platform running a microkernel. This platform can be enhanced with coprocessors generated by the user-guided high-level synthesis (UGH) tool. Cycle accurate system simulator (CASS) is a high-performance cycle-accurate simulator. The main steps of the design flow are KPN modeling, functional validation, design space exploration, high-level synthesis, and temporal validation. The design flow starts by modeling the application as a KPN. This initial description is done in C using the DPN library. The functional validation is performed by running the initial description directly on the host. Without modifying the initial description, the user can simulate a HW/software (SW) partitioning by indicating the number of processors and the processes that are to be migrated to HW. This simulation is done at the cycle-accurate level for the whole system, except for the migrated processes for which the user must provide estimated time models. The description of the processes that are selected for HW implementation must be translated into a subset of C and then synthesized. This new description is still compatible with the DPN library, so it can be used for functional validation. The temporal validation is done at the cycle-accurate level using the initial description for the SW processes and cycle-accurate models automatically generated from the C subset description for the HW processes. Disydent's strength relies on its formal- KPN model that ensures a behavior that is independent of the overall system scheduling, its fast cycle-accurate validation that is several orders of magnitude faster than classical event-driven simulators, and its single description of a process that is used as input of DPN, CASS, and UGH
ISBN 978-1-4673-2234-8International audienceIn this paper, we detail the design and implementation of a router for vertically-partially-connected 3D-NoCs based on stacked 2D-meshes. This router implements the necessary hardware to support a recently introduced routing algorithm called "Elevator-First", which targets topologies with irregularly placed vertical connections in a deadlock free manner, using only two virtual channels in the plane. The micro-architectural design shows that the proposed router requires few additional hardware. Our studies about the practicality of the algorithm and its router implementation demonstrate that it has low overhead compared to a router for fully connected 3D-NoCs. Using ST Microelectronics 65nm CMOS technology Elevator-First router with 7 ports has a total area of 0.07mm², an Operating frequency of over 3GHz and a power consumption of around 3mW
scite is a Brooklyn-based startup that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.