“…In conventional methods, the main problem concerns the matrix elements' access, typically stored in a memory where it is not possible to access all elements of P c i simultaneously. To overcome this problem, the circulant matrix structure is used [9]. The circulant form of an N × N matrix is a matrix whose elements in each row are rotated to right one element relative to the preceding row [14].…”
Section: Circulant Matrices and Matrix Multiplicationmentioning
confidence: 99%
“…The architecture of the proposed matrix-computing unit has been designed to optimize both the performance and the data flow of the complete sequence of matrix operations. To do that, an extension of the circulant matrix multiplication architecture [9] is proposed. This architecture permits keeping the matrix multiplication computationally efficient and, at the same time, exploiting some relevant data flow benefits to improve the performance of the whole computation sequence:…”
Section: Matrix Computing Unit Proposalmentioning
confidence: 99%
“…This work aims to design a matrix-computing unit taking advantage of the efficiency of the circulant matrix multiplication [9] while extending it to other operations and optimizing the data flow to compute a sequence of matrix operations. The main contributions of this paper are as follows:…”
mentioning
confidence: 99%
“…In addition, Table 4 shows the peak frequency at which the unit can repeatedly perform each operation (considering the required number of cycles and the maximum frequency of operations for each matrix size). From the obtained results, it can be determined that the proposed matrix computing architecture extended the circulant matrix multiplication architecture [9] to other operations while keeping the architecture benefits: exploiting local interconnections, data storage, and computations to reduce delays and communication overhead. This fact avoids typical communication bottlenecks in a classical fully-parallel system.…”
High dimensional matrix algebra is essential in numerous signal processing and machine learning algorithms. This work describes a scalable square matrix-computing unit designed on the basis of circulant matrices. It optimizes data flow for the computation of any sequence of matrix operations removing the need for data movement for intermediate results, together with the individual matrix operations' performance in direct or transposed form (the transpose matrix operation only requires a data addressing modification). The allowed matrix operations are: matrix-by-matrix addition, subtraction, dot product and multiplication, matrix-by-vector multiplication, and matrix by scalar multiplication. The proposed architecture is fully scalable with the maximum matrix dimension limited by the available resources. In addition, a design environment is also developed, permitting assistance, through a friendly interface, from the customization of the hardware computing unit to the generation of the final synthesizable IP core. For N × N matrices, the architecture requires N ALU-RAM blocks and performs O(N 2 ), requiring N 2 + 7 and N + 7 clock cycles for matrix-matrix and matrix-vector operations, respectively. For the tested Virtex7 FPGA device, the computation for 500 × 500 matrices allows a maximum clock frequency of 346 MHz, achieving an overall performance of 173 GOPS. This architecture shows higher performance than other state-of-the-art matrix computing units.
“…In conventional methods, the main problem concerns the matrix elements' access, typically stored in a memory where it is not possible to access all elements of P c i simultaneously. To overcome this problem, the circulant matrix structure is used [9]. The circulant form of an N × N matrix is a matrix whose elements in each row are rotated to right one element relative to the preceding row [14].…”
Section: Circulant Matrices and Matrix Multiplicationmentioning
confidence: 99%
“…The architecture of the proposed matrix-computing unit has been designed to optimize both the performance and the data flow of the complete sequence of matrix operations. To do that, an extension of the circulant matrix multiplication architecture [9] is proposed. This architecture permits keeping the matrix multiplication computationally efficient and, at the same time, exploiting some relevant data flow benefits to improve the performance of the whole computation sequence:…”
Section: Matrix Computing Unit Proposalmentioning
confidence: 99%
“…This work aims to design a matrix-computing unit taking advantage of the efficiency of the circulant matrix multiplication [9] while extending it to other operations and optimizing the data flow to compute a sequence of matrix operations. The main contributions of this paper are as follows:…”
mentioning
confidence: 99%
“…In addition, Table 4 shows the peak frequency at which the unit can repeatedly perform each operation (considering the required number of cycles and the maximum frequency of operations for each matrix size). From the obtained results, it can be determined that the proposed matrix computing architecture extended the circulant matrix multiplication architecture [9] to other operations while keeping the architecture benefits: exploiting local interconnections, data storage, and computations to reduce delays and communication overhead. This fact avoids typical communication bottlenecks in a classical fully-parallel system.…”
High dimensional matrix algebra is essential in numerous signal processing and machine learning algorithms. This work describes a scalable square matrix-computing unit designed on the basis of circulant matrices. It optimizes data flow for the computation of any sequence of matrix operations removing the need for data movement for intermediate results, together with the individual matrix operations' performance in direct or transposed form (the transpose matrix operation only requires a data addressing modification). The allowed matrix operations are: matrix-by-matrix addition, subtraction, dot product and multiplication, matrix-by-vector multiplication, and matrix by scalar multiplication. The proposed architecture is fully scalable with the maximum matrix dimension limited by the available resources. In addition, a design environment is also developed, permitting assistance, through a friendly interface, from the customization of the hardware computing unit to the generation of the final synthesizable IP core. For N × N matrices, the architecture requires N ALU-RAM blocks and performs O(N 2 ), requiring N 2 + 7 and N + 7 clock cycles for matrix-matrix and matrix-vector operations, respectively. For the tested Virtex7 FPGA device, the computation for 500 × 500 matrices allows a maximum clock frequency of 346 MHz, achieving an overall performance of 173 GOPS. This architecture shows higher performance than other state-of-the-art matrix computing units.
“…In recent years, with the increase of network bandwidth and the development of video image compression technology, people can carry out transmission of image and video through a variety of ways. In particular, with the progress of wireless transmission technology, the communication technology is affecting people's production and life in an unprecedented scale and degree [6]. The transmission of video and image information by wireless channels has become an urgent demand of application and the rapid improvement of network transmission technology; especially the wireless transmission technology could bring convenience to us, but it also brings hidden danger that sensitive image information may be stolen easily and spread illegally.…”
Depending on the actual demand of maritime security, this paper analyzes the specific requirements of video encryption algorithm for maritime monitoring system. Based on the technology of Internet of things, the intelligent monitoring system of unmanned surface vessels (USV) is designed and realized, and the security technology and network technology of the Internet of things are adopted. The USV are utilized to monitor and collect information on the sea, which is critical to maritime security. Once the video data were captured by pirates and criminals during the transmission, the security of the sea will be affected awfully. The shortcomings of traditional algorithms are as follows: the encryption degree is not high, computing cost is expensive, and video data is intercepted and captured easily during the transmission process. In order to overcome the disadvantages, a novel encryption algorithm, i.e., the improved Hill encryption algorithm, is proposed to deal with the security problems of the unmanned video monitoring system in this paper. Specifically, the Hill algorithm of classical cryptography is transplanted into image encryption, using an invertible matrix as the key to realize the encryption of image matrix. The improved Hill encryption algorithm combines with the process of video compression and regulates the parameters of the encryption process according to the content of the video image and overcomes the disadvantages that exist in the traditional encryption algorithm and decreases the computation time of the inverse matrix so that the comprehensive performance of the algorithm is optimal with different image information. Experiments results validate the favorable performance of the proposed improved encryption algorithm.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.