Xuan Yang scite author profile

Specialized image processing accelerators are necessary to deliver the performance and energy efficiency required by important applications in computer vision, computational photography, and augmented reality. But creating, "programming,"and integrating this hardware into a hardware/software system is difficult. We address this problem by extending the image processing language Halide so users can specify which portions of their applications should become hardware accelerators, and then we provide a compiler that uses this code to automatically create the accelerator along with the "glue" code needed for the user's application to access this hardware. Starting with Halide not only provides a very high-level functional description of the hardware, but also allows our compiler to generate the complete software program including the sequential part of the workload, which accesses the hardware for acceleration. Our system also provides high-level semantics to explore different mappings of applications to a heterogeneous system, with the added flexibility of being able to map at various throughput rates.We demonstrate our approach by mapping applications to a Xilinx Zynq system. Using its FPGA with two low-power ARM cores, our design achieves up to 6× higher performance and 38× lower energy compared to the quad-core ARM CPU on an NVIDIA Tegra K1, and 3.5× higher performance with 12× lower energy compared to the K1's 192-core GPU.

show abstract

Electric field gradient effects in anti-plane problems of polarized ceramics

Yang

2004

International Journal of Solids and Structures

View full text Add to dashboard Cite

Tangram

Gao

Yang

et al. 2019

View full text Add to dashboard Cite

The use of increasingly larger and more complex neural networks (NNs) makes it critical to scale the capabilities and efficiency of NN accelerators. Tiled architectures provide an intuitive scaling solution that supports both coarse-grained parallelism in NNs: intra-layer parallelism, where all tiles process a single layer, and inter-layer pipelining, where multiple layers execute across tiles in a pipelined manner. This work proposes dataflow optimizations to address the shortcomings of existing parallel dataflow techniques for tiled NN accelerators. For intra-layer parallelism, we develop buffer sharing dataflow that turns the distributed buffers into an idealized shared buffer, eliminating excessive data duplication and the memory access overheads. For interlayer pipelining, we develop alternate layer loop ordering that forwards the intermediate data in a more fine-grained and timely manner, reducing the buffer requirements and pipeline delays. We also make inter-layer pipelining applicable to NNs with complex DAG structures. These optimizations improve the performance of tiled NN accelerators by 2× and reduce their energy consumption by 45% across a wide range of NNs. The effectiveness of our optimizations also increases with the NN size and complexity. CCS Concepts • Computer systems organization → Neural networks; Data flow architectures.

show abstract

Antibacterial surfaces: Strategies and applications

Yang

Hou

Tian

et al. 2022

Sci. China Technol. Sci.

View full text Add to dashboard Cite

Antibacterial surfaces are surfaces that can resist bacteria, relying on the nature of the material itself. It is significant for safe food and water, human health, and industrial equipment. Biofilm is the main form of bacterial contamination on the material surface. Preventing the formation of biofilm is an efficient way to develop antibacterial surfaces. The strategy for constructing the antibacterial surface is divided into bacteria repelling and bacteria killing based on the formation of the biofilm. Material surface wettability, adhesion, and steric hindrance determine bacteria repelling performance. Bacteria should be killed by surface chemistry or physical structures when they are attached to a material surface irreversibly. Killing approaches are usually in the light of the cell membrane of bacteria. This review summarizes the fabrication methods and applications of antibacterial surfaces from the view of the treatment of the material surfaces. We also present several crucial points for developing long-term stability, no drug resistance, broad-spectrum, and even programable antibacterial surfaces.

show abstract

Interstellar

et al. 2020

View full text Add to dashboard Cite

Second-order frequency shifts in crystal resonators under relatively large biasing fields

Kosinski

Pastore

Yang³

et al.

View full text Add to dashboard Cite

Stress-induced frequency shifts in langasite thickness-mode resonators

Kosinski

Pastore

Yang

et al.

View full text Add to dashboard Cite

Tetris

Gao

Yang

et al. 2017

289

View full text Add to dashboard Cite

The high accuracy of deep neural networks (NNs) has led to the development of NN accelerators that improve performance by two orders of magnitude. However, scaling these accelerators for higher performance with increasingly larger NNs exacerbates the cost and energy overheads of their memory systems, including the on-chip SRAM buffers and the off-chip DRAM channels. This paper presents the hardware architecture and software scheduling and partitioning techniques for TETRIS, a scalable NN accelerator using 3D memory. First, we show that the high throughput and low energy characteristics of 3D memory allow us to rebalance the NN accelerator design, using more area for processing elements and less area for SRAM buffers. Second, we move portions of the NN computations close to the DRAM banks to decrease bandwidth pressure and increase performance and energy efficiency. Third, we show that despite the use of small SRAM buffers, the presence of 3D memory simplifies dataflow scheduling for NN computations. We present an analytical scheduling scheme that matches the efficiency of schedules derived through exhaustive search. Finally, we develop a hybrid partitioning scheme that parallelizes the NN computations over multiple accelerators. Overall, we show that TETRIS improves the performance by 4.1x and reduces the energy by 1.5x over NN accelerators with conventional, low-power DRAM memory systems.

show abstract

12 3 4 5 6

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Xuan Yang

Programming Heterogeneous Systems from an Image Processing DSL

Electric field gradient effects in anti-plane problems of polarized ceramics

Tangram

Antibacterial surfaces: Strategies and applications

Interstellar

Second-order frequency shifts in crystal resonators under relatively large biasing fields

Stress-induced frequency shifts in langasite thickness-mode resonators

Tetris

Contact Info

Product

Resources

About