Yi Pan scite author profile

Distributed stream processing systems need to support stateful processing, recover quickly from failures to resume such processing, and reprocess an entire data stream quickly. We present Apache Samza, a distributed system for stateful and fault-tolerant stream processing. Samza utilizes a partitioned local state along with a low-overhead background changelog mechanism, allowing it to scale to massive state sizes (hundreds of TB) per application. Recovery from failures is sped up by re-scheduling based on Host Affinity. In addition to processing infinite streams of events, Samza supports processing a finite dataset as a stream, from either a streaming source (e.g., Kafka), a database snapshot (e.g., Databus), or a file system (e.g. HDFS), without having to change the application code (unlike the popular Lambda-based architectures which necessitate maintenance of separate code bases for batch and stream path processing). Samza is currently in use at LinkedIn by hundreds of production applications with more than 10, 000 containers. Samza is an open-source Apache project adopted by many top-tier companies (e.g., LinkedIn, Uber, Netflix, TripAdvisor, etc.). Our experiments show that Samza: a) handles state efficiently, improving latency and throughput by more than 100X compared to using a remote storage; b) provides recovery time independent of state size; c) scales performance linearly with number of containers; and d) supports reprocessing of the data stream quickly and with minimal interference on real-time traffic.

show abstract

2D/2D NiCo-MOFs/GO hybrid nanosheets for high-performance asymmetrical supercapacitor

Shi

Pan

et al. 2021

Diamond and Related Materials

View full text Add to dashboard Cite

Fishbone-like Ni3S2/Co3S4 integrated with nickel MOF nanosheets for hybrid supercapacitors

Pan

Shi

Chen

et al. 2021

Applied Surface Science

View full text Add to dashboard Cite

Porous yolk-shell structured Na3(VO)2(PO4)2F microspheres with enhanced Na-ion storage properties

Yin

Pei

Xiong

et al. 2021

Journal of Materials Science & Technology

View full text Add to dashboard Cite

Numerical Simulation of Muzzle Blast Overpressure in Antiaircraft Gun Muzzle Brake

Guo¹,

Pan²,

Zhang³

et al. 2013

J. Inf. Comput. Sci.

View full text Add to dashboard Cite

SamzaSQL: Scalable Fast Data Management with Streaming SQL

Pathirage

Hyde

Pan

et al. 2016

View full text Add to dashboard Cite

To stay competitive in today's data driven economy, enterprises large and small are turning to stream processing platforms to process high volume, high velocity, and diverse streams of data (fast data) as they arrive. Low-level programming models provided by the popular systems of today suffer from lack of responsiveness to change: enhancements require code changes with attendant large turn-around times. Even though distributed SQL query engines have been available for Big Data, we still lack support for SQL-based stream querying capabilities in distributed stream processing systems. In this white paper, we identify a set of requirements and propose a standard SQL based streaming query model for management of what has been referred to as Fast Data.

show abstract

Significantly enhanced electrochemical performance of 2D Ni-MOF by carbon quantum dot for high-performance supercapacitors

Pan

Yan

Liu

et al. 2022

Electrochimica Acta

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Yi Pan

Benzoic acid-modified 2D Ni-MOF for high-performance supercapacitors

Samza

2D/2D NiCo-MOFs/GO hybrid nanosheets for high-performance asymmetrical supercapacitor

Fishbone-like Ni3S2/Co3S4 integrated with nickel MOF nanosheets for hybrid supercapacitors

Porous yolk-shell structured Na3(VO)2(PO4)2F microspheres with enhanced Na-ion storage properties

Numerical Simulation of Muzzle Blast Overpressure in Antiaircraft Gun Muzzle Brake

SamzaSQL: Scalable Fast Data Management with Streaming SQL

Significantly enhanced electrochemical performance of 2D Ni-MOF by carbon quantum dot for high-performance supercapacitors

Contact Info

Product

Resources

About