AutomataZoo: A Modern Automata Processing Benchmark Suite

Wadden, Jack; Tracy, Tommy; Sadredini, Elaheh; Wu, Lingxi; Bo, Chunkun; Du, Jesse; Wei, Yizhou; Udall, Jeffrey; Wallace, Marianne; Stan, Mircea R.; Skadron, Kevin

doi:10.1109/iiswc.2018.8573482

Cited by 20 publications

(14 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Application Configurations. We evaluate 18 applications from three different benchmark suites: AutomataZoo [53], Regex [10] and ANMLZoo [51]. Table 2 [67], we modify the NFAs so that the outgoing edges of each state is 4 or less using an iterative algorithm.…”

Section: Evaluation Methodologymentioning

confidence: 99%

Why GPUs are Slow at Executing NFAs and How to Make them Faster

Liu

Pai

Jog

2020

Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Syste

View full text Add to dashboard Cite

Non-deterministic Finite Automata (NFA) are space-efficient finite state machines that have significant applications in domains such as pattern matching and data analytics. In this paper, we investigate why the Graphics Processing Unit (GPU)-a massively parallel computational device with the highest memory bandwidth available on generalpurpose processors-cannot efficiently execute NFAs. First, we identify excessive data movement in the GPU memory hierarchy and describe how to privatize reads effectively using GPU's on-chip memory hierarchy to reduce this excessive data movement. We also show that in several cases, indirect table lookups in NFAs can be eliminated by converting memory reads into computation, to further reduce the number of memory reads. Although our optimization techniques significantly alleviate these memory-related bottlenecks, a side effect of these techniques is the static assignment of work to cores. This leads to poor compute utilization, where GPU cores are wasted on idle NFA states. Therefore, we propose a new dynamic scheme that effectively balances compute utilization with reduced memory usage. Our combined optimizations provide a significant improvement over the previous state-ofthe-art GPU implementations of NFAs. Moreover, they enable current GPUs to outperform the domain-specific accelerator for NFAs (i.e., Automata Processor) across several applications while performing within an order of magnitude for the rest of the applications. CCS Concepts • Computing methodologies → Parallel algorithms; • Computer systems organization → Single instruction, multiple data; • Theory of computation → Formal languages and automata theory.

show abstract

Section: Evaluation Methodologymentioning

confidence: 99%

Why GPUs are Slow at Executing NFAs and How to Make them Faster

Liu

Pai

Jog

2020

Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Syste

View full text Add to dashboard Cite

show abstract

“…• the CICERO architecture and the associated architectural optimizations to exploit the intrinsic parallelism of non-deterministic REs (Section 5); • a comprehensive validation of the CICERO overall solution with embedded FPGA prototypes and a comparison with embedded (ARM) and mainstream (Intel) processors (Section 6). We evaluated our single-and multi-engine FPGA prototypes using real benchmarks from the open-source AutomataZoo benchmark suite [36]. We obtained excellent results both in terms of performance and energy efficiency: our CICERO architecture is 28.6× and 20.8× more energyefficient than ARM and Intel processors, respectively.…”

Section: Introductionmentioning

confidence: 95%

“…In all the experiments, we used Protomata [28] and Brill [42] benchmarks from the AutomataZoo suite [36], which represent proteomics and natural language processing applications, respectively. We considered Protomata and Brill since they both belong to the family of "Regex" benchmarks of the original ANMLZoo suite [35].…”

Section: Experimental Validationmentioning

confidence: 99%

“…We considered Protomata and Brill since they both belong to the family of "Regex" benchmarks of the original ANMLZoo suite [35]. Therefore, their RE representation is ready to use [36], and they target novel compelling research fields, i.e., bioinformatics and natural language processing. Moreover, we believe that these two benchmarks represent two opposite use cases: one more suitable to CICERO features, i.e., with a high number of alternatives (Protomata), against an unsuitable one with a wide variety of sequential REs (Brill).…”

Section: Experimental Validationmentioning

confidence: 99%

“…RE matching is an essential kernel [2] for traditional computer security [32,41] and database queries [19,31] but also for novel domains such as natural language processing [35,42], and genomeprotein matching [8,36]. The literature contains different algorithms to tackle RE matching.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

CICERO: A Domain-Specific Architecture for Efficient Regular Expression Matching

Parravicini

Conficconi

Sozzo

et al. 2021

ACM Trans. Embed. Comput. Syst.

View full text Add to dashboard Cite

Regular Expression (RE) matching is a computational kernel used in several applications. Since RE complexity and data volumes are steadily increasing, hardware acceleration is gaining attention also for this problem. Existing approaches have limited flexibility as they require a different implementation for each RE. On the other hand, it is complex to map efficient RE representations like non-deterministic finite-state automata onto software-programmable engines or parallel architectures. In this work, we present CICERO , an end-to-end framework composed of a domain-specific architecture and a companion compilation framework for RE matching. Our solution is suitable for many applications, such as genomics/proteomics and natural language processing. CICERO aims at exploiting the intrinsic parallelism of non-deterministic representations of the REs. CICERO can trade-off accelerators’ efficiency and processors’ flexibility thanks to its programmable architecture and the compilation framework. We implemented CICERO prototypes on embedded FPGA achieving up to 28.6× and 20.8× more energy efficiency than embedded and mainstream processors, respectively. Since it is a programmable architecture, it can be implemented as a custom ASIC that is orders of magnitude more energy-efficient than mainstream processors.

show abstract