Yuichiro Shibata scite author profile

In this paper, deep pipelined FPGA implementation of a real-time image-based human detection algorithm is presented. By using binary patterned HOG features, AdaBoost classifiers generated by offline training, and some approximation arithmetic strategies, our architecture can be efficiently fitted on a low-end FPGA without any external memory modules. Empirical evaluation reveals that our system achieves 62.5 fps of the detection throughput, showing 96.6% and 20.7% of the detection rate and the false positive rate, respectively. Moreover, if a highspeed camera device is available, the maximum throughput of 112 fps is expected to be accomplished, which is 7.5 times faster than software implementation.

show abstract

FPGA Implementation of a Data-Driven Stochastic Biochemical Simulator with the Next Reaction Method

Yoshiini

Iwaoka

Nishikawa

et al. 2007

View full text Add to dashboard Cite

This paper introduces a scalable FPGA implementation of a stochastic simulation algorithm (SSA) called the Next Reaction Method. There are some hardware approaches of SSAs that obtained high-throughput on reconfigurable devices such as FPGAs, but these works lacked in scalability. The design of this work can accommodate to the increasing size of target biochemical models, or to make use of increasing capacity of FPGAs. Interconnection network between arithmetic circuits and multiple simulation circuits aims to perform a data-driven multi-threading simulation. Approximately 8 times speedup was obtained compared to an execution on Xeon 2.80GHz.

show abstract

The second-order condition for the dielectric interface orthogonal to the Yee-lattice axis in the FDTD scheme

Hirono

Shibata

Lui³

et al. 2000

IEEE Microw. Guid. Wave Lett.

View full text Add to dashboard Cite

A novel multiple-walk parallel algorithm for the Barnes–Hut treecode on GPUs – towards cost effective, high performance N-body simulation

Hamada

Nitadori

Benkrid

et al. 2009

Comp. Sci. Res. Dev.

View full text Add to dashboard Cite

Recently, general-purpose computation on graphics processing units (GPGPU) has become an increasingly popular field of study as graphics processing units (GPUs) continue to be proposed as high performance and relatively low cost implementation platforms for scientific computing applications. Among these applications figure astrophysical N-body simulations, which form one of the most challenging problems in computational science. However, in most reported studies, a simple O(N 2 ) algorithm was used for GPGPUs, and the resulting performances were not observed to be better than those of conventional CPUs that were based on more optimized O(N log N) algorithms such as the tree algorithm or the particle-particle particle-mesh algorithm. Because of the difficulty in getting efficient implementations of such algorithms on GPUs, a GPU cluster had no practical advantage over general-purpose PC clusters for N-body simulations. In this paper, we report a new method for efficient parallel implementation of the tree algorithm on GPUs. Our novel tree code allows the realization of an N-body simulation on a GPU cluster at a much higher performance than that on general PC clusters. We practically performed a cosmological simulation with 562 million particles on a GPU cluster using 128 NVIDIA GeForce 8800GTS GPUs at an overall cost of 168 172 $. We obtained a sustained performance of 20.1 Tflops, which when normalized against a general-purpose CPU implementation leads to a performance of 8.50 Tflops. The achieved cost/performance was hence a mere $19.8 /Gflops which shows the high competitiveness of GPGPUs.

show abstract

An FPGA Implementation of High Throughput Stochastic Simulator for Large-Scale Biochemical Systems

Yoshimi

Osana

Iwaoka

et al. 2006

View full text Add to dashboard Cite

Stochastic simulation of biochemical systems has become one of major approaches to study life processes as system, yet is a computational challenge to run the simulation due to its vast calculation cost. This paper shows the implementation and evaluation of a stochastic simulation algorithm (SSA) called "First Reaction Method" on an FPGAbased biochemical simulator. It achieves high throughput by (1) consecutively throwing data into deeply-pipelined floating point arithmetic units, and (2) by distruibuting multiple simulators for parallel execution. As the result of evaluation on an FPGA-based simulation platform called ReCSiP2, the simulator outperforms execution on Xeon 2.80 GHz by approximately 80 times, even with large-scale biochemical systems.

show abstract

Power Performance Profiling of 3-D Stencil Computation on an FPGA Accelerator for Efficient Pipeline Optimization

Okina

Soejima

Fukumoto

et al. 2016

SIGARCH Comput. Archit. News

View full text Add to dashboard Cite

This paper discusses power-performance optimization for 3-D stencil computing on a stream-oriented FPGA accelerator with highlevel synthesis. Taking a heat conduction simulation and an FDTD electromagnetic field simulation as benchmark applications, powerperformance profiling results are presented focusing on the effect of high-level pipeline parameters. As a result, it is shown that the optimal power efficiency can be achieved basically by optimizing the execution performance. The relationship between power efficiency and the clock frequency is also discussed.

show abstract

Unsupervised Segmentation of Bibliographic Elements with Latent Permutations

Masada

Shibata

Oguri

2011

View full text Add to dashboard Cite

Abstract. This paper introduces a novel approach for large-scale unsupervised segmentation of bibliographic elements. Our problem is to segment a word token sequence representing a citation into subsequences each corresponding to a different bibliographic element, e.g. authors, paper title, journal name, publication year, etc. Obviously, each bibliographic element should be represented by contiguous word tokens. We call this constraint contiguity constraint. Therefore, we should infer a sequence of assignments of word tokens to bibliographic elements so that this constraint is satisfied. Many HMM-based methods solve this problem by prescribing fixed transition patterns among bibliographic elements. In this paper, we use generalized Mallows models (GMM) in a Bayesian multi-topic model, effectively applied to document structure learning by Chen et al. [4], and infer a permutation of latent topics each of which can be interpreted as one among the bibliographic elements. According to the inferred permutation, we arrange the order of the draws from a multinomial distribution defined over topics. In this manner, we can obtain an ordered sequence of topic assignments satisfying contiguity constraint. We do not need to prescribe any transition patterns among bibliographic elements. We only need to specify the number of bibliographic elements. However, the method proposed by Chen et al. works for our problem only after introducing modification. The main contribution of this paper is to propose strategies to make their method work also for our problem.

show abstract

Accelerating Collapsed Variational Bayesian Inference for Latent Dirichlet Allocation with Nvidia CUDA Compatible Devices

Masada

Hamada

Shibata

et al. 2009

View full text Add to dashboard Cite

Abstract. In this paper, we propose an acceleration of collapsed variational Bayesian (CVB) inference for latent Dirichlet allocation (LDA) by using Nvidia CUDA compatible devices. While LDA is an efficient Bayesian multi-topic document model, it requires complicated computations for parameter estimation in comparison with other simpler document models, e.g. probabilistic latent semantic indexing, etc. Therefore, we accelerate CVB inference, an efficient deterministic inference method for LDA, with Nvidia CUDA. In the evaluation experiments, we used a set of 50,000 documents and a set of 10,000 images. We could obtain inference results comparable to sequential CVB inference.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.