2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA) 2021
DOI: 10.1109/hpca51647.2021.00040
|View full text |Cite
|
Sign up to set email alerts
|

QEI: Query Acceleration Can be Generic and Efficient in the Cloud

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
2
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(3 citation statements)
references
References 51 publications
0
2
0
Order By: Relevance
“…These works clearly show that RoCE reduces the CPU load (related to network communication) significantly compared to TCP and, at the same time, offers lower communication latency. RoCE communication has been traditionally established over RDMA-capable NICs, i.e., the main focus has been to build such hardware, e.g., [28]. Although Soft-RoCE has been developed to enable hardware-independent RDMA communication, it has not received much attention for industrial use.…”
Section: Distributed Automotive Applicationmentioning
confidence: 99%
“…These works clearly show that RoCE reduces the CPU load (related to network communication) significantly compared to TCP and, at the same time, offers lower communication latency. RoCE communication has been traditionally established over RDMA-capable NICs, i.e., the main focus has been to build such hardware, e.g., [28]. Although Soft-RoCE has been developed to enable hardware-independent RDMA communication, it has not received much attention for industrial use.…”
Section: Distributed Automotive Applicationmentioning
confidence: 99%
“…First, a (de)serializer can be optionally used, if the application uses an RPC protocol for inter-machine communications [90]. Then, to process requests, we typically need a data structure walker [52,86,105,173,176] to find the location of the target data of the request. To maximize the memory-level parallelism and hide the memory access latency, multiple outstanding requests and out-of-order execution should be supported.…”
Section: Orca Cc-accelerator Architecturementioning
confidence: 99%
“…In CPU and Smart NIC, batching means processing requests in a batch to improve the memory access efficiency [99]. In ORCA, since the APU can already exploit the memory-level parallelism across requests [86,105,173,176], there is no need for request batching. Hence, we batch the doorbell signals to the RNIC [77] when posting RDMA operations for response.…”
Section: B In-memory Key-value Storementioning
confidence: 99%