Saeed Fathollahzadeh scite author profile

We describe an approach for a custom complex event processing engine using Message Passing Interface (MPI) in C++ programming language. Our approach utilizes a multi-processor infrastructure and distributes its load on multiple processes, expecting each process to run on one processor. A dispatching process receives events and distributes them on several query processes which are responsible for updating the actual queries. Query processes forwards any updates to a presentation process that output the results in an appropriate format. The distribution of roles among processes allows better scalability since further query processes can be added dynamically to handle more queries. In our evaluation we measured event processing up to a throughput of 12k events/sec using 4 processor cores.

show abstract

GIO: Generating Efficient Matrix and Frame Readers for Custom Data Formats by Example

Fathollahzadeh

Böehm

2023

Proc. ACM Manag. Data

View full text Add to dashboard Cite

Data Scientists deal with a wide variety of file data formats and data representations. Probably the most difficult to handle are custom data formats that liberally define their own particular flat or nested structure with multiple custom delimiters, multi-line records, or undocumented semantics of attribute sequences, co-appearances, and repetitions. As a prerequisite for exploratory ML model training, data scientists need to map these data representations into regular frames or matrices. Unfortunately, existing tools and frameworks provide only limited support for aiding this process, which causes redundant manual efforts and unnecessary data quality issues. In this paper, we initiate work on automatic matrix and frame reader generation by example. A user provides a sample of raw text data and its mapped matrix or frame representation. Our GIO framework then first identifies the mapping rules from raw to structured data, and subsequently generates source code of an efficient, multi-threaded reader for reading full raw datasets of this format. In order to facilitate manual improvements, both the mapping rules, and generated reader can be modified as needed. Our experiments show that GIO is able to correctly identify the mapping rules for basic text formats like CSV, LibSVM, MatrixMarket; custom text formats from publishing, automotive, and health care; as well as various nested formats such as JSON and XML. Additionally, the automatically generated readers yield competitive performance compared to hand-coded readers and tuned libraries like RapidJSON.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Saeed Fathollahzadeh

Stateful complex event detection on event streams using parallelization of event stream aggregations and detection tasks

Real-Time Object Recognition from Streaming LiDAR Point Cloud Data

Parallel event processing on unbound streams with multi-step windowing

GIO: Generating Efficient Matrix and Frame Readers for Custom Data Formats by Example

Contact Info

Product

Resources

About