Bibliography
ContentsTowards Many-core Processors communication mechanisms. Associating the network interface with on-chip memory, allows it to flexibly handle transfers of a few bytes up to several kilobytes. It also allows for processor decoupled (or asynchronous) network interface operation, that can overlap bulk transfers with computation to inexpensively hide latencies, without the need for non-blocking caches. A simple DMA engine can support bulk transfers from and into scratchpad memory, without necessitating processor architecture adaptation to data transfer requirements as in the case of vector and out-of-order processors.One additional issue has to be addressed regarding scratchpad memories and network interfaces in the processor environment. Low latency access is indispensable for their utility in the on-chip environment of general purpose many-core processors, thus making prohibitive any interaction with the operating system in the common case. In order to support concurrent and protected access by multiple processes, scratchpads and their associated NI must be accessible at user-level.Protected, user-level access is achievable via memory mapping of resources. In addition, the close coupling of the network interface with the processor can facilitate translation and protection mechanisms in the network interface. Such mechanisms will enable application-space arguments to communication (e.g. virtual addresses for communication endpoints), although circumventing the operating system in the common case. Reversely, receiving transfered data in user-level accessible scratchpad memory, avoids the need for copying between kernel and user memory space.Caches and user-level accessible scratchpads, utilizing the organization of figure 1.1(b), exploit the advantage that computation occurs "in-place", in the same memory where data are fetched to, without copying. This advantage occurs naturally, because the memory used for computation is also the "communication memory" managed by the cache controller or the network interface. This thesis advocates a virtualized network interface closely-coupled to the processor, that supports fast local data access and communication initiation at userlevel, allows software-controlled data transfer and placement, and exploits NI memory for computation.