The photochemical autoxidation of isopropylbenzene

We present two designs (I and II) for IEEE 754 double precision floating point matrix multiplication, optimized for implementation on high-end FPGAs. It forms the kernel in many important tile-based BLAS algorithms, making an excellent candidate for acceleration. The designs, both based on the rank-1 update scheme, can handle arbitrary matrix sizes, and are able to sustain their peak performance except during an initial latency period. Through these designs, the trade-offs involved in terms of local-memory and bandwidth for an FPGA implementation are demonstrated and an analysis is presented for the optimal choice of design parameters. The designs, implemented on a Virtex-5 SX240T FPGA, scale gracefully from 1 to 40 processing elements(PEs) with a less than 1% degradation in the design frequency of 373 MHz. With 40 PEs and a design speed of 373 MHz, a sustained performance of 29.8 GFLOPS is possible with a bandwidth requirement of 750 MB/s for design-II and 5.9 GB/s for design-I. This compares favourably with both related art and general purpose CPU implementations.

show abstract

Realization of set functions as cut functions of graphs and hypergraphs

Fujishige

Patkar

2001

Discrete Mathematics

View full text Add to dashboard Cite

FPGA Implementation of Particle Filter Based Object Tracking in Video

Agrawal

Velmurugan

Patkar

2012

View full text Add to dashboard Cite

FPGA Based High Performance Double-Precision Matrix Multiplication

Kumar

Joshi

Patkar

et al. 2009

View full text Add to dashboard Cite

Improving graph partitions using submodular functions

Patkar

Narayanan

2003

Discrete Applied Mathematics

View full text Add to dashboard Cite

The realization of finite state machines by decomposition and the principal lattice of partitions of a submodular function

Desai

Narayanan

Patkar

2003

Discrete Applied Mathematics

View full text Add to dashboard Cite

Optimal Folding of Data Flow Graphs Based on Finite Projective Geometry Using Vector Space Partitioning

Choudhary

Sharma

Patkar

2014

Discrete Math. Algorithm. Appl.

View full text Add to dashboard Cite

A number of computations exist, especially in area of error-control coding and matrix computations, whose underlying data flow graphs are based on finite projective-geometry (PG) based balanced bipartite graphs. Many of these applications of finite projective geometry are actively being researched upon, especially in coding theory. Almost all these applications need large bipartite graphs, whose nodes represent parallel computations. To reduce its implementation cost, reducing amount of system/hardware resources during design is an important engineering objective. In this context, we present a scheme to reduce resource utilization while designing systems modeled using PG-based graphs. In such systems, the number of processing units is equal to the number of vertices, each performing an atomic computation. We present a novel way of partitioning the vertex set assigned to various atomic computations, into blocks. Each block of partition is then assigned to a processing unit. A processing unit performs the computations corresponding to the vertices in the block assigned to it in a sequential fashion, thus creating the effect of folding the overall computation. The symmetric properties of projective space lattices enable us to develop a conflict-free communication schedule. We employed the technique of coset decomposition of a finite field for partitioning. The folding scheme achieves the best possible throughput, in lack of any overhead of shuffling data across memories while scheduling another computation on the same processing unit. We first provide a scheme for a finite projective space of dimension five, and the corresponding schedules. This specific scheme is then generalized for arbitrary finite projective spaces. Both the folding schemes have been verified by both simulation as well as hardware prototyping. For example, a semi-parallel decoder architecture for a new class of expander codes was designed and implemented using this scheme, with potential deployment in DVD-R/CD-ROM drives.

show abstract

An efficient practical heuristic for good ratio-cut partitioning

Patkar

Narayanan

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Sachin B. Patkar

FPGA Based High Performance Double-Precision Matrix Multiplication

Realization of set functions as cut functions of graphs and hypergraphs

FPGA Implementation of Particle Filter Based Object Tracking in Video

FPGA Based High Performance Double-Precision Matrix Multiplication

Improving graph partitions using submodular functions

The realization of finite state machines by decomposition and the principal lattice of partitions of a submodular function

Optimal Folding of Data Flow Graphs Based on Finite Projective Geometry Using Vector Space Partitioning

An efficient practical heuristic for good ratio-cut partitioning

Contact Info

Product

Resources

About