Exploring Shared Virtual Memory for FPGA Accelerators with a Configurable IOMMU

Vogel, Pirmin; Marongiu, Andrea; Benini, Luca

doi:10.1109/tc.2018.2879080

Cited by 10 publications

(4 citation statements)

References 44 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This tactic allows for the accumulation of more things without exceeding some kind of data transfer cap. Client-facing de-duplication relies on a detailed history of users movements [28]. Although the record prompts the user to check if the server is live, this approach keeps the restriction position and movement speed same.…”

Section: Proposed Methodsmentioning

confidence: 99%

Efficient Memory Handling Model with Consistent Video Frame Duplication Removal with Precise Compression

Sameerunnisa,

Jabez

2023

RIA

View full text Add to dashboard Cite

Massive amounts of videos are being made and shared online as mobile devices and social networks gain popularity in recent years. The enormous expansion in the amount of video data created has made storing and quickly searching it all quite difficult. Because many movies are duplicates or near-duplicates in practice, recognizing these copies has become a critical strategy for decreasing the amount of storage with duplicate removal models. Video compression is an important part of Internet video delivery for efficient memory management. Deep learning's growth has sparked a revival in video compression, with many frameworks offering comparable or even higher performance than traditional video codecs presented in recent years. Despite the advancement in rate-distortion, these models are substantially slower and need more memory, limiting their practical application. The exponentially increasing volume of video data created has presented enormous problems to video deduplication technologies. People are interested in uploading and sharing information in photo and video formats in this digital era. This expansion has resulted in increased storage capacity, which contains a large amount of redundant multimedia material. Many deduplication algorithms are being rapidly developed nowadays, although they are often slow and have rather imprecise identification processes. Deduplication is one of the emerging ways for coping with redundant data stored in several locations. When more than a copy of the same data is detected, a single copy is preserved, and the other data is replaced by pointers pointing to the preserved copy and also duplicate frames will be removed by segmenting the video for memory efficiency. Storage can be utilised to effectively store a large amount of other data. While there are many other types of deduplication algorithms, picture and video deduplication strategies and implementations receive a lot of attention since they are difficult to implement. In this research a Consistent Video Frame Duplication Removal with Precise Compression (CVFDR-PC) model for efficient memory handling is proposed. This research provides a versatile and efficient video frame deduplication framework with compression model that effectively handles the memory. The proposed model when contrasted with the existing methods exhibit better performance levels.

show abstract

Section: Proposed Methodsmentioning

confidence: 99%

Efficient Memory Handling Model with Consistent Video Frame Duplication Removal with Precise Compression

Sameerunnisa,

Jabez

2023

RIA

View full text Add to dashboard Cite

show abstract

“…The PMCAs consist of many minimal, domain-specific processing elements (PEs), potentially grouped in clusters, have a memory hierarchy of physically-addressed, software-managed scratchpad memorys (SPMs), and include an input/output memory management unit (IOMMU) to share the virtual memory space with the host. many examples of such architectures in products ranging from high-performance computing (HPC) [24,38] over highperformance SoCs [16] to low-power SoCs [17,52] as well as in research [9,21,28,59].…”

Section: Target Architecturementioning

confidence: 99%

Mixed-data-model heterogeneous compilation and OpenMP offloading

Kurth

Wolters

Forsberg

et al. 2020

Proceedings of the 29th International Conference on Compiler Construction

View full text Add to dashboard Cite

Heterogeneous computers combine a general-purpose host processor with domain-specific programmable many-core accelerators, uniting high versatility with high performance and energy efficiency. While the host manages ever-more application memory, accelerators are designed to work mainly on their local memory. This difference in addressed memory leads to a discrepancy between the optimal address width of the host and the accelerator. Today 64-bit host processors are commonplace, but few accelerators exceed 32-bit addressable local memory, a difference expected to increase with 128-bit hosts in the exascale era. Managing this discrepancy requires support for multiple data models in heterogeneous compilers. So far, compiler support for multiple data models has not been explored, which hampers the programmability of such systems and inhibits their adoption.In this work, we perform the first exploration of the feasibility and performance of implementing a mixed-data-model heterogeneous system. To support this, we present and evaluate the first mixed-data-model compiler, supporting arbitrary address widths on host and accelerator. To hide the inherent complexity and to enable high programmer productivity, we implement transparent offloading on top of OpenMP. The proposed compiler techniques are implemented in LLVM

show abstract

“…Another solution is to build a hybrid hardware architecture that allows the CPU and accelerator to share memory, mostly used in large computer systems, as shown in Figure 1. IOMMU [4][5][6] uses a separate MMU to map peripheral-accessible physical addresses to host physical addresses, allowing accelerators to directly access memory. But compared with directly sharing physical memory, independent MMU is not conducive to the mixed programming of CPU-accelerator programs.…”

Section: Introductionmentioning

confidence: 99%

CLMalloc: contiguous memory management mechanism for large-scale CPU-accelerator hybrid architectures

Zhang

Lü²,

Zhang³

2023

Third International Symposium on Computer Engineering and Intelligent Communications (ISCEIC 2022)

View full text Add to dashboard Cite

Heterogeneous accelerators play a crucial role in improving computer performance. General-purpose computers reduce the frequent communication between traditional accelerators with separate memory and the host computer through fast communication links. Some high-speed devices such as supercomputers integrate the accelerator and CPU on one chip, and the shared memory is managed by the operating system, which shifts the performance bottleneck from data acquisition to accelerator addressing. Existing memory management mechanisms typically reserve contiguous physical memory locally for peripherals for efficient direct memory access. However, in large computer systems with multiple memory nodes, the accelerator's memory access behavior is limited by the local memory capacity. The difficulty of addressing accelerators across nodes prevents computers from maximizing the benefits of massive memory. This paper proposes a contiguous memory management mechanism for a large-scale CPU-accelerator hybrid architecture (CLMalloc) to simultaneously support the different types of memory requirements of CPU and accelerator programs. In simulation experiments, CLMalloc achieves similar (or even better) performance to the system functions malloc/free. Compared with the DMA-based baseline, the space utilization of CLMalloc is increased by 2×, and the latency is reduced by 80% to 90%.

show abstract

Exploring Shared Virtual Memory for FPGA Accelerators with a Configurable IOMMU

Cited by 10 publications

References 44 publications

Efficient Memory Handling Model with Consistent Video Frame Duplication Removal with Precise Compression

Efficient Memory Handling Model with Consistent Video Frame Duplication Removal with Precise Compression

Mixed-data-model heterogeneous compilation and OpenMP offloading

CLMalloc: contiguous memory management mechanism for large-scale CPU-accelerator hybrid architectures

Contact Info

Product

Resources

About