2015
DOI: 10.14778/2824032.2824063
|View full text |Cite
|
Sign up to set email alerts
|

Building a replicated logging system with Apache Kafka

Abstract: Apache Kafka is a scalable publish-subscribe messaging system with its core architecture as a distributed commit log. It was originally built at LinkedIn as its centralized event pipelining platform for online data integration tasks. Over the past years developing and operating Kafka, we extend its log-structured architecture as a replicated logging backbone for much wider application scopes in the distributed environment. In this abstract, we will talk about our design and engineering experience to replicate … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
49
0
2

Year Published

2017
2017
2020
2020

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 125 publications
(51 citation statements)
references
References 2 publications
0
49
0
2
Order By: Relevance
“…We communicate with the engine using the Pub/Sub messaging service. 9 Specifically, we deploy a topology composed of four operators: (1) a Pub/Sub subscriber that reads elements from an input topic; (2) a window operator; (3) a reducer that concatenates the content of each window into an output string; (4) the Pub/Sub publisher that writes the results of the reducer on an output topic. We submit elements by publishing them on the input topic and we read the results from the output topic.…”
Section: Google Cloud Dataflowmentioning
confidence: 99%
See 2 more Smart Citations
“…We communicate with the engine using the Pub/Sub messaging service. 9 Specifically, we deploy a topology composed of four operators: (1) a Pub/Sub subscriber that reads elements from an input topic; (2) a window operator; (3) a reducer that concatenates the content of each window into an output string; (4) the Pub/Sub publisher that writes the results of the reducer on an output topic. We submit elements by publishing them on the input topic and we read the results from the output topic.…”
Section: Google Cloud Dataflowmentioning
confidence: 99%
“…In the presence of out-of-order elements that alter the values of some results produced in the past, the engine retracts the previous output from the mutable dataset and substitutes it with the newly computed values. This is the case of the Kafka Stream system [9].…”
Section: Management Of Out-of-order Elementsmentioning
confidence: 99%
See 1 more Smart Citation
“…32 Similar work with our approach is replicated logging system using publish-subscribe messaging middleware. 33 Although replicated logging is similar to our CSMs, the difference is that our approach is based on stream processing rather than event- The elements in CSM are explained as follows.…”
Section: Stream-based Data Replicationmentioning
confidence: 99%
“…The structure of the real-time analysis system is the use of Flume [4] to monitor /usr/local/data/flume_sources/data-1 if new data is generated, and every log information is collected in real time and saved in the Kafka message system which is then consumed by Storm system. Meanwhile, the consumption record is based on the Zookeeper cluster management which means last time consumption record can be find even if Kafka is down after the restart and continue to consumption from Kafka Broker [5] . Because of the non-atomic operations including consumption before record or record before consumption, few data loss or repeat consumption problems will occur when Kafka is down or similar problems happened at the time message is not recorded to Zookeeper after consumption.…”
Section: Real-time Data Writing To Linuxmentioning
confidence: 99%