George Nikiforos scite author profile

The platform will undergo maintenance on Sep 14 at about 7:45 AM EST and will be unavailable for approximately 2 hours.

George Nikiforos

5Publications

21Citation Statements Received

41Citation Statements Given

How they've been cited

How they cite others

Affiliations

Foundation for Research and Technology Hellas, FORTH Institute of Computer Science

Publications

Order By: Most citations

FPGA implementation of a configurable cache/scratchpad memory with virtualized user-level RDMA capability

Kalokerinos

Papaefstathiou

Nikiforos

et al. 2009

View full text Add to dashboard Cite

We report on the hardware implementation of a local memory system for individual processors inside future chip multiprocessors (CMP). It intends to support both implicit communication, via caches, and explicit communication, via directly accessible local ("scratchpad") memories and remote DMA (RDMA). We provide run-time configurability of the SRAM blocks near each processor, so that part of them operates as 2nd level (local) cache, while the rest operates as scratchpad. We also strive to merge the communication subsystems required by the cache and scratchpad into one integrated Network Interface (NI) and Cache Controller (CC), in order to economize on circuits. The processor communicates with the NI in user-level, through virtualized command areas in scratchpad; through a similar mechanism, the NI also provides efficient support for synchronization, using two hardware primitives: counters, and queues. We describe the block diagram, the hardware cost, and the latencies of our FPGA-based prototype implementation, which integrates four MicroBlaze processors, each with 64 KBytes of local SRAM, a crossbar NoC, and a DRAM controller on a Xilinx-5 FPGA. One-way, end-to-end, user-level communication completes within about 30 clock cycles for short transfer sizes.

show abstract

Prototyping a Configurable Cache/Scratchpad Memory with Virtualized User-Level RDMA Capability

Kalokerinos¹,

Papaefstathiou²,

Nikiforos³

et al. 2019

View full text Add to dashboard Cite

Low-latency explicit communication and synchronization in scalable multi-core clusters

Kachris

Nikiforos

Papaefstathiou

et al. 2010

View full text Add to dashboard Cite

Abstract-One of the main challenges in the multi-core area is the communication and synchronization of the cores and the design of an efficient interconnection network that is scalable to multiple cores. In this paper we present an efficient implementation of a scalable system that is targeting multicore systems. Each cluster node consists of 4 processors that support both explicit and implicit communication. Processor's cache is augmented with scratchpad and is merged with the network interface (NI) for reduced communication latency. All nodes are connected through a novel layer-2 switch that can support up to 20 nodes. The proposed system is designed and implemented using multiple FPGA boards and the performance evaluation presents the aggregate throughput of the system (with 16 processors) and the communication latency between that cluster nodes.

show abstract

Fine-grain OpenMP runtime support with explicit communication hardware primitives

Tendulkar

Papaefstathiou

Nikiforos

et al. 2011

View full text Add to dashboard Cite

Abstract-We present a runtime system that uses the explicit on-chip communication mechanisms of the SARC multi-core architecture, to implement efficiently the OpenMP programming model and enable the exploitation of fine-grain parallelism in OpenMP programs. We explore the design space of implementation of OpenMP directives and runtime intrinsics, using a family of hardware primitives; remote stores, remote DMAs, hardware counters and hardware event queues with automatic responses, to support static and dynamic scheduling and data transfers in local memories. Using an FPGA prototype with four cores, we achieve OpenMP task creation latencies of 30-35 processor clock cycles, initiation of parallel contexts in 50 cycles and synchronization primitives in 65-210 cycles.

show abstract

Network Processing in Multi-core FPGAs with Integrated Cache-Network Interface

Kachris

Nikiforos

Kavadias

et al. 2010

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.