Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data 2013
DOI: 10.1145/2463676.2463707
|View full text |Cite
|
Sign up to set email alerts
|

The big data ecosystem at LinkedIn

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
50
0

Year Published

2013
2013
2023
2023

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 115 publications
(57 citation statements)
references
References 24 publications
0
50
0
Order By: Relevance
“…They lack parallelism model support for processing but are often combined with other platforms to deliver full real-time data analytics and processing [28,29]. We found that in our case, both Kafka and Flume can be used to load the CDRs to the processing nodes but as they are optimized for streams, this approach may introduce overhead for splitting larger sizes of batched CDRs to smaller chunks.…”
Section: Related Workmentioning
confidence: 98%
“…They lack parallelism model support for processing but are often combined with other platforms to deliver full real-time data analytics and processing [28,29]. We found that in our case, both Kafka and Flume can be used to load the CDRs to the processing nodes but as they are optimized for streams, this approach may introduce overhead for splitting larger sizes of batched CDRs to smaller chunks.…”
Section: Related Workmentioning
confidence: 98%
“…Consequently, distributed and streaming applications of machine learning algorithms 14 should be explored to effectively model the large corpus of online reviews which exist in the real world [38]. Mahout has been used for large-scale recommendation systems [39], which would be useful to apply to review spam detection, as reviewers may be related to each other on different review websites. MLlib and SAMOA can perform large-scale online learning, where machine learning models are trained and tuned as new data flows in.…”
Section: Comparative Analysis and Suggestionsmentioning
confidence: 99%
“…There has also been a huge interest and opportunity of big data in the health industry. Facebook and LinkedIn collect from both traditional database and streaming data from users whereas Twitter mostly deals with streaming data [26,27,28,29,32,33]. The collected data are then handled on a batch or streaming processing with each own defined processing functionalities.…”
Section: A Big Data Use Casementioning
confidence: 99%