Our system is currently under heavy load due to increased usage. We're actively working on upgrades to improve performance. Thank you for your patience.
2013 IEEE 13th International Conference on Data Mining 2013
DOI: 10.1109/icdm.2013.158
|View full text |Cite
|
Sign up to set email alerts
|

MLI: An API for Distributed Machine Learning

Abstract: MLI is an Application Programming Interface designed to address the challenges of building Machine Learning algorithms in a distributed setting based on data-centric computing. Its primary goal is to simplify the development of high-performance, scalable, distributed algorithms. Our initial results show that, relative to existing systems, this interface can be used to build distributed implementations of a wide variety of common Machine Learning algorithms with minimal complexity and highly competitive perform… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
90
0

Year Published

2014
2014
2020
2020

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 133 publications
(90 citation statements)
references
References 7 publications
0
90
0
Order By: Relevance
“…They scale well to tens of nodes, but at large scale, this synchrony creates challenges as the chance of a node operating slowly increases. Mahout [4], based on Hadoop [18] and MLI [44], based on Spark [50], both adopt the iterative MapReduce [14] framework. A key insight of Spark and MLI is preserving state between iterations, which is a core goal of the parameter server.…”
Section: Related Workmentioning
confidence: 99%
“…They scale well to tens of nodes, but at large scale, this synchrony creates challenges as the chance of a node operating slowly increases. Mahout [4], based on Hadoop [18] and MLI [44], based on Spark [50], both adopt the iterative MapReduce [14] framework. A key insight of Spark and MLI is preserving state between iterations, which is a core goal of the parameter server.…”
Section: Related Workmentioning
confidence: 99%
“…Similar techniques, including batching data within Spark records, indexing it, and optimizing partitioning, have been used in GraphX [112], MLlib [96], MLI [98] and other projects. Together, these techniques have allowed RDD-based systems to achieve similar performance to specialized systems in each domain, while providing much higher performance in applications that combine processing types, and fault tolerance across these types of computations.…”
Section: Discussionmentioning
confidence: 99%
“…Some of these works include performing analytics over Twitter [12], computing k-means clustering over big data in the cloud [13], providing recommendations [14] [15], studying the behavior of tourists [16], performing sentiment analysis [17], minimizing product escapes in aerospace test environments [18], improving a predictive model in a healthcare domain [19], detecting astrophysical objects [20], discovering communities in social networks [21] and many more. Also, recent works have provided detailed studies of technologies for batch processing techniques over big data [22], as well as current applications and systems for this purpose [23] and proposed APIs for distributed machine learning [24].…”
Section: State Of the Artmentioning
confidence: 99%