State-of-the-art neural network architectures such as ResNet, MobileNet, and DenseNet have achieved outstanding accuracy over low MACs and small model size counterparts. However, these metrics might not be accurate for predicting the inference time. We suggest that memory traffic for accessing intermediate feature maps can be a factor dominating the inference latency, especially in such tasks as real-time object detection and semantic segmentation of high-resolution video. We propose a Harmonic Densely Connected Network to achieve high efficiency in terms of both low MACs and memory traffic. The new network achieves 35%, 36%, 30%, 32%, and 45% inference time reduction compared with FC-DenseNet-103, DenseNet-264, ResNet-50, ResNet-152, and SSD-VGG, respectively. We use tools including Nvidia profiler and ARM Scale-Sim to measure the memory traffic and verify that the inference latency is indeed proportional to the memory traffic consumption and the proposed network consumes low memory traffic. We conclude that one should take memory traffic into consideration when designing neural network architectures for high-resolution applications at the edge.
A new compact second-order tri-band microstrip BPF has been proposed to provide three commercially practical passbands centered at 1.57, 2.45, and 3.5 GHz. The BPF consists of two folded, rectangular-shaped TSSIRs, with each resonator having tunable first three resonant frequencies. The proposed tri-band microstrip BPF has been fabricated on a 0.635 mm thick RT/Duroid 6010 substrate with an occupied circuit area of only 23.29 ϫ 10.94 mm 2 , and the measured result was found to agree very well with that obtained from simulation. The measured fractional bandwidths (minimum insertion losses) are found to be 11.5% (0.92 dB), 4.6% (2.19 dB), and 10.9% (1.3 dB), respectively, in the 1.57, 2.45, and 3.5 GHz frequency bands, respectively. Results also show that by embedding the spur lines and DGS in the BPF, transmission zeros were introduced around the first spurious passband to achieve a wide upper stopband.
ACKNOWLEDGMENTThis work was supported by the National Science Council of Taiwan, ROC, under Grant NSC 96-2221-E-018-002.
FIBER-BASED TELECOMS COMPONENTS AT 1550 nm FOR THE GENERATION OF cw-THz BY PHOTOMIXING
We survey recent developments in high level synthesis technology for VLSI design. The need for higher-level design automation tools are discussed first. We then describe some basic techniques for various subtasks of high-level synthesis. Techniques that have been proposed in the past few years (since 1994) for various subtasks of high-level synthesis are surveyed. We also survey some new synthesis objectives including testability, power efficiency, and reliability.
-We propose a near optimal hardware architecture for deblocking filter in H.264/MPEG-4 AVC. We propose a novel filtering order and a data reuse strategy that result in significant saving in filtering time, local memory usage, and memory traffic. Every 16x16 macroblock requires 192 filtering operations. After a few initialization cycles, our 5-stage pipelined architecture is able to perform one filtering operation per cycle. Compared with some state-of-the-art designs, our architecture delivers the fastest level of performance while using much smaller gate count and memory. We have implemented and integrated the proposed deblocking filter into an H.264 main profile video decoder and verified it with an FPGA prototype.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.