Future many-core systems need to handle high power density and chip temperature effectively. Some cores in many-core systems need to be turned off or ‘dark’ to manage chip power and thermal density. This phenomenon is also known as the dark silicon problem. This problem prevents many-core systems from utilizing and gaining improved performance from a large number of processing cores. This paper presents a dynamic thermal-aware performance optimization of dark silicon many-core systems (DTaPO) technique for optimizing dark silicon a many-core system performance under temperature constraint. The proposed technique utilizes both task migration and dynamic voltage frequency scaling (DVFS) for optimizing the performance of a many-core system while keeping system temperature in a safe operating limit. Task migration puts hot cores in low-power states and moves tasks to cooler dark cores to aggressively reduce chip temperature while maintaining high overall system performance. To reduce task migration overhead due to cold start, the source core (i.e., active core) keeps its L2 cache content during the initial migration phase. The destination core (i.e., dark core) can access it to reduce the impact of cold start misses. Moreover, the proposed technique limits tasks migration among cores that share the last level cache (LLC). In the case of major thermal violation and no cooler cores being available, DVFS is used to reduce the hot cores temperature gradually by reducing their frequency. Experimental results for different threshold temperatures show that DTaPO can keep the average system temperature below the thermal limit. Affirmatively, the execution time penalty is reduced by up to 18% compared with using only DVFS for all thermal thresholds. Moreover, the average peak temperature is reduced by up to 10.8°C. In addition, the experimental results show that DTaPO improves the system’s performance by up to 80% compared to optimal sprinting patterns (OSP) and reduces the temperature by up to 13.6°C.
This article describes a pipeline synthesis and optimization technique that increases data throughput of FPGAbased system using minimum pipeline resources. The technique is applied on CAL dataflow language, and designed based on relations, matrices, and graphs. First, the initial as-soon-as-possible (ASAP) and as-late-aspossible (ALAP) schedules, and the corresponding mobility of operators are generated. From this, operator coloring technique is used on conflict and nonconflict directed graphs using recursive functions and explicit stack mechanisms. For each feasible number of pipeline stages, a pipeline schedule with minimum total register width is taken as an optimal coloring, which is then automatically transformed to a description in CAL. The generated pipelined CAL descriptions are finally synthesized to hardware description languages for FPGA implementation. Experimental results of three video processing applications demonstrate up to 3.9× higher throughput for pipelined compared to non-pipelined implementations, and average total pipeline register width reduction of up to 39.6 and 49.9% between the optimal, and ASAP and ALAP pipeline schedules, respectively.
This paper presents a Very Large Scale Integrated (VLSI) design and implementation of a fixed-point 8x8 multiplierless Discrete Cosine Transform (DCT) using the ISO/IEC 23002-2 algorithm. The standard DCT algorithm, which is mainly used in image and video compression technology, consists of only adders, subtractors, and shifters, therefore making it efficient for hardware implementation. The VLSI implementation of the algorithm given in this paper further enhances the performance of the transform unit. Furthermore, circuit pipelining has been applied to the base design of the DCT, which significantly improves the performance by reducing the longest path in the non-pipeline design. The DCT has been implemented using semi-custom VLSI design methodology using the TSMC 0.13um process technology. Results show that our DCT designs can run up to around 1.7 Giga pixels/s, which is well above the timing required for real-time ultra-high definition 8K video.Copyright c 2017 Institute of Advanced Engineering and Science.All rights reserved. DCT basis function is the Cosine, where multiplication and addition are the main arithmetic operations involved. Many DCT-based research has been conducted in the past few years, which has produced different kind of DCT Algorithms, such as Arai DCT scheme, Wang Factorization, Lee DCT for power of two block length, Loeffler algorithm, and Feig-Winograd factorization ([2]). These Algorithms have been used in practical applications. In recent image processing technology, various hardware implementation of DCT are using Arai DCT scheme [3]. It uses only five multiplications and twenty-nine addition, which is less arithmetic operations if compared to other stated algorithms. For the MPEG technology, the International Standards Organization (ISO) released an optimized fixedpoint multiplierless version of the DCT algorithm, suitable for image and video compression. The standard which is called the ISO/IEC 23002-2 is described and implemented in the present work [4].VLSI design of DCT can be found in numerous articles, with an overview given in [5]. For comparison purposes, we have analyzed three similar designs. The work by Mandayake et al in [6] presents a VLSI architecture of the DCT using the Arai DCT scheme. It proposes a fast algorithm by reducing the number of integer channels. The design is implemented using 45nm technology. The work by Wahid et al [7] proposes an area efficient fixed point DCT architecture implemented in 0.18 um CMOS technology. Another interesting work is by Fu et al [8], where a low power implementation is proposed based on algebraic integer encoding technique. This work also utilizes 0.18um CMOS technology. Performance results for these works are given in the results section.The present paper on the other hand, describes the semi-custom Very Large Scale Integration (VLSI) design of the ISO/IEC 23002-2 DCT algorithm using TSMC 0.13um technology, similar to the design methodology used in
The Internet of things (IoT) and advancements of wireless technology have evolved intelligent transport systems to integrate billion of smart objects ready to connect to the Internet. The modern era of the Internet of things (IoT) has brought significant development in vehicular ad hoc networks (VANETs) which transformed the conventional VANET into the Internet of Vehicle (IoV) to improve road safety and diminished road congestion. However, security threats are increasing due to dependency on infrastructure, computing, dynamic nature, and control technologies of VANET. The security threats of VANETs could be addressed comprehensively by increasing trustworthiness on the message received and transmitting node. Conversely, the presence of dishonest vehicles, for instance, Man in the Middle (MiTM) attackers, in the network sharing malicious content could be posed as a severe threat to VANET. Thus, increasing trustworthiness among nodes can lead to increased authenticity, privacy, accuracy, security, and trusted information sharing in the VANET. In this paper, a lightweight trust model is proposed, presented model identifying dishonest nodes and revoking its credential in the MiTM attack scenario. Furthermore, addressing the privacy and security requirement, the pseudonym scheme is used. All nodes in the VANET established trust provided by initially RSU, which is a trusted source in the network. Extensive experiments are conducted based on a variety of network scenarios to evaluate the accuracy and performance of the presented lightweight trust model. In terms of recall, precision, and F-score, our presented model significantly outperformed compared to MARINE. The simulation results have validated that the proposed lightweight model realized a high trust level with 40% of MiTM attackers and in terms of F-score 95%, whereas the MARINE model has 90%, which leads to the model to attain high detection accuracy.
The paper introduces a new methodology for pipeline synthesis with applications to data flow high level system design. The pipeline synthesis is applied to dataflow programs whose operators are translated into graphs and dependencies relations that are then processed for the pipeline architecture optimization. For each pipeline-stage time, a minimal number of pipeline stages is first determined and then an optimal assignment of operators to stages is generated with the objective of minimizing the total pipeline register size. The obtained "optimal" pipeline schedule is automatically transformed back into a dataflow program that then can be synthesized to efficient hardware implementations. Two new pipeline scheduling:"least cost search branch and bound", and an heuristic technique have been developed. The first algorithm yields global optimum solutions for middle size designs, whereas the second one generates close-to-optimal solutions for large designs. Experimental results on FPGA designs show that the total pipeline register size gain in a range up to 4.68x can be achieved. The new algorithms overcome the known downward and upward direction dataflow graph traversal algorithms concerning the amount of pipeline register size by up to 100% on average.Index Terms-Data flow, hardware design, pipeline, optimization, branch and bound algorithm, heuristic algorithm, high level synthesis
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.