An FPGA-Based Singular Value Decomposition Processor

Ma, Weijiao; Kaye, Mary E.; Lüke, Dennis; Doraiswami, R.

doi:10.1109/ccece.2006.277355

Cited by 27 publications

(12 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Ma et al [19] proposed the implementation of two-sided rotation Jacobi SVD algorithm on a two million gate FPGA. They proposed a mesh connected array structure of simple 2 × 2 processors to compute SVD of a large matrix.…”

Section: Related Workmentioning

confidence: 99%

Singular value decomposition on GPU using CUDA

Lahabar

Narayanan

2009

2009 IEEE International Symposium on Parallel &Amp; Distributed Processing

104

View full text Add to dashboard Cite

show abstract

Section: Related Workmentioning

confidence: 99%

Singular value decomposition on GPU using CUDA

Lahabar

Narayanan

2009

2009 IEEE International Symposium on Parallel &Amp; Distributed Processing

104

View full text Add to dashboard Cite

show abstract

“…Previous FPGA-based implementations have looked at SVD [Brent and Luk (1982)], QRD [Wang and Leeser (2009)] and sparse LUD [Kapre and DeHon (2009)]. However, those approaches all have some limitations in common: either restricted with the scalability of the adapted matrices due to the logic capacity of FPGAs [Brent and Luk (1982); Ahmedsaid et al (2003); Ma et al (2006); Ledesma-Carrillo et al (2011); Wang and Leeser (2009)] or required the input matrices of special property or irregular sparsity structure [Rafique et al (2012);Tai et al (2011); Vachranukunkiet (2007); Kapre and DeHon (2009); Wu et al (2012)].…”

Section: Contributions: Fpga-based Accelerators For Matrix Decompositmentioning

confidence: 99%

“…Previously, FPGAs were employed to demonstrate the highly parallel implementations of EVD and SVD based on two-sided Jacobi Rotations, by accelerating their independent 2 × 2 rotations, using a parallel architecture featuring a 2-dimensional systolic array. In this earlier work, the scalability of the applicable matrices had been severely restricted by the limited resources on FPGAs [Brent and Luk (1982); Brent et al (1985); Ahmedsaid et al (2003); Ma et al (2006)]. In [Brent and Luk (1982); Brent et al (1985)], the authors demonstrated the efficiency of the 2D systolic array designs for EVD and SVD with the time complexity of O(n log n) for an n-by-n square matrix, in which log n was proved as the number of iterations for reasonable convergence with certain threshold by applying parallel Jacobi rotation or cyclic Jacobi rotation methods; meanwhile, a number of n 2 processing units (PEs) are needed.…”

Section: Related Workmentioning

confidence: 99%

“…The emergence of reconfigurable fabrics such as FPGAs introduces low-cost solutions to parallelize the algorithm at the operand-level granularity. To perform SVD, 1-dimensional or 2-dimensional systolic arrays have been employed to parallelize the classic two-sided Jacobi rotation algorithm [Brent and Luk (1982); Brent et al (1985); Ahmedsaid et al (2003);Ma et al (2006)]. With the featured independent 2 × 2 rotations, a highly parallel 2-dimensional systolic array is employed to map the classic two-sided Jacobi rotation algorithm into FPGA devices with the computational complexity of O(n log n) for an n-by-n square matrix.…”

Section: Householder Transformation Implementations Have Been Demonsmentioning

confidence: 99%

“…using FPGAs [Brent et al (1985); Hestenes (1958)]. However, similar to other FPGA-based designs for matrix decomposition [Tai et al (2011);Wu et al (2012)], the logic capacity of FPGAs has typically limited the scalability of the adapted matrices [Brent and Luk (1982); Ahmedsaid et al (2003); Ma et al (2006); Ledesma-Carrillo et al (2011)], even though this previous work targeted applications in real-time signal processing using fixed-point arithmetic, for which hardware resource utilization is significantly less than for floating-point arithmetic.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations