Arthur Stoutchinin scite author profile

Arthur Stoutchinin

5Publications

63Citation Statements Received

105Citation Statements Given

How they've been cited

How they cite others

112

104

Affiliations

STMicroelectronics (France), STMicroelectronics (Czechia), University of Delaware

Publications

Order By: Most citations

Speculative Prefetching of Induction Pointers

Stoutchinin

Amaral

Gao

et al. 2001

View full text Add to dashboard Cite

We present an automatic approach for prefetching data for linked list data structures. The main idea is based on the observation that linked list elements are frequently allocated at constant distance from one another in the heap. When linked lists are traversed, a regular pattern of memory accesses with constant stride emerges. This regularity in the memory footprint of linked lists enables the development of a prefetching framework where the address of the element accessed in one of the future iterations of the loop is dynamically predicted based on its previous regular behavior. We automatically identify pointer-chasing recurrences in loops that access linked lists. This identification uses a surprisingly simple method that looks for induction pointers-pointers that are updated in each loop iteration by a load with a constant offset. We integrate induction pointer prefetching with loop scheduling. A key intuition incorporated in our framework is to insert prefetches only if there are processor resources and memory bandwidth available. In order to estimate available memory bandwidth we calculate the number of potential cache misses in one loop iteration. Our estimation algorithm is based on an application of graph coloring on a memory access interference graph derived from the control flow graph. We implemented the prefetching framework in an industry-strength production compiler, and performed experiments on ten benchmark programs with linked lists. We observed performance improvements between 15% and 35% in three of them.

show abstract

Efficient static single assignment form for predication

Stoutchinin¹,

Ferrière²

View full text Add to dashboard Cite

We present a framework that allows translation of predicated code into the static single assignment (SSA) form, and simpliJies application of the SSA-based optimizations to predicated code. In particulal; we represent predicate join points in the program by the Q-functions similar to the $-functions of the basic SSA. The SSA-based optimizations (such as constant propagation) can be applied to predicated code by simply specifying additional rules for processing the Q-functions. We present efJicient algorithms for constructing, and then for removing the Q-functions at the end of SSA processing. Our algorithm for translating out of the Q-SSA splits predicated live ranges into smaller live ranges active under disjoint predicates. The experimental evaluation on a set of predicated benchmarks demonstrates e@-ciency of our approach.

show abstract

Code generator optimizations for the ST120 DSP-MCU core

Dinechin

Ferri

Guillon

et al. 2000

View full text Add to dashboard Cite

Optimally Scheduling CNN Convolutions for Efficient Memory Access

Stoutchinin¹,

Conti²,

Benini³

2019

Preprint

View full text Add to dashboard Cite

StreamDrive: a Dynamic Dataflow Framework for Clustered Embedded Architectures

Stoutchinin

Benini

2018

J Sign Process Syst

View full text Add to dashboard Cite

In this paper, we present StreamDrive, a dynamic dataflow framework for programming clustered embedded multicore architectures. StreamDrive simplifies development of dynamic dataflow applications starting from sequential reference C code and allows seamless handling of heterogeneous and applicationspecific processing elements by applications. We address issues of efficient implementation of the dynamic dataflow runtime system in the context of constrained embedded environments, which have not been sufficiently addressed by previous research. We conducted a detailed performance evaluation of the StreamDrive implementation on our Application Specific MultiProcessor (ASMP) cluster using the Oriented FAST and Rotated BRIEF (ORB) algorithm typical of image processing domain. We have used the proposed incremental development flow for the transformation of the ORB original reference C code into an optimized dynamic dataflow implementation. Our implementation has less than 10% parallelization overhead, near-linear speedup when the number of processors increases from 1 to 8, and achieves the performance of 15 VGA frames per

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.