SystemC implementation of mat-core: A matrix core extension for general-purpose processors

Soliman, Mostafa I.; Al-Junaid, Abdulmajid F.

doi:10.1109/dtis.2009.4938014

Cited by 3 publications

(2 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…These units are communicated through architectural queues which are used to temporary keep the loaded/stored data. The SystemC implementation of the decoupled Mat-Core processor is described in detail in [13].…”

Section: Strpsmentioning

confidence: 99%

See 1 more Smart Citation

Codevelopment of Multi-level ISA and hardware for an efficient matrix processor

Soliman

Al-Junaid

2009

2009 International Conference on Computer Engineering &Amp; Systems

Self Cite

View full text Add to dashboard Cite

The instruction set architecture (ISA) is the part of the processor that is visible to the programmer or compiler writer. Multi-level ISA is proposed to explicitly communicate data parallelism to hardware (processor) in a compact way instead of the dynamic extraction using complex hardware or the static extraction using sophisticated compiler techniques. This paper presents the codevelopment of multi-level ISA and hardware for an efficient matrix processor called Mat-Core. Mat-Core extends a general-purpose scalar processor with a matrix unit for processing vector/matrix data. To hide memory latency, the extended matrix unit is decoupled into two components: address generation and data computation, which communicate through data queues. Like vector architectures, the data computation unit is organized in parallel lanes. However, on parallel lanes, Mat-Core can execute scalar-matrix, vector-matrix, and matrix-matrix instructions in addition to scalar-vector and vector-vector instructions. Mat-Core leads to a compiler model that is efficient both in terms of performance and executable code size. On four parallel lanes Mat-Core, our results show performances of about 1.6, 2.1, 4.1, and 6.4 FLOPs per clock cycle; achieved on scalar-vector multiplication, SAXPY, vector-matrix multiplication, and matrix-matrix multiplication, respectively.

show abstract

Section: Strpsmentioning

confidence: 99%

“…The elements of vector data are distributed across the lanes in a round-robin, interleaved fashion (see Figure 1b). SystemC has been used to simulate the Mat-Core processor (see [13] for more detail).…”

mentioning

confidence: 99%