Mohammad Amir Mansoori scite author profile

Microwave Imaging (MI) for biomedical applications has attracted attention due to its harmless radiation compared to X-ray or MRI. One of the commonly used computing methods in MI is Finite Difference Time Domain (FDTD), which is executed several times in iterative loops, hence resulting in a high execution time. Although several hardware accelerators for FDTD have been recently introduced, they are not specifically designed for MI applications. In particular, only simple absorbing boundary conditions have been investigated, and the impact of dispersive materials on FDTD has not been considered. In this paper, we propose a multi-FPGA accelerator for 3D FDTD that is integrated in an MI algorithm, with Convolutional Perfectly Matched Layer (CPML) boundary conditions and an exact model for dispersive materials. By using High Level Synthesis (HLS), we obtain an optimized hardware accelerator that uses an efficient blocking method to reduce the data transfer time between external and local memories. We propose two alternative architectures that trade off performance and resource usage. In addition, our code, being developed at a high level, can also be run on GPUs whenever necessary. The results show that our multi-FPGA accelerator is superior to three similar GPU-based designs in terms of execution time and power consumption.

show abstract

High Level Design of a Flexible PCA Hardware Accelerator Using a New Block-Streaming Method

Mansoori

Casu

2020

Electronics

View full text Add to dashboard Cite

Principal Component Analysis (PCA) is a technique for dimensionality reduction that is useful in removing redundant information in data for various applications such as Microwave Imaging (MI) and Hyperspectral Imaging (HI). The computational complexity of PCA has made the hardware acceleration of PCA an active research topic in recent years. Although the hardware design flow can be optimized using High Level Synthesis (HLS) tools, efficient high-performance solutions for complex embedded systems still require careful design. In this paper we propose a flexible PCA hardware accelerator in Field-Programmable Gate Arrays (FPGA) that we designed entirely in HLS. In order to make the internal PCA computations more efficient, a new block-streaming method is also introduced. Several HLS optimization strategies are adopted to create an efficient hardware. The flexibility of our design allows us to use it for different FPGA targets, with flexible input data dimensions, and it also lets us easily switch from a more accurate floating-point implementation to a higher speed fixed-point solution. The results show the efficiency of our design compared to state-of-the-art implementations on GPUs, many-core CPUs, and other FPGA approaches in terms of resource usage, execution time and power consumption.

show abstract

Efficient Training and Hardware Co-design of Machine Learning Models

Mansoori

Casu

2022

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.