Large linear classification when data cannot fit in memory

Yu, Hsiang‐Fu; Hsieh, Cho‐Jui; Chang, Kai-Wei; Lin, Chih-Jen

doi:10.1145/1835804.1835910

Cited by 72 publications

(116 citation statements)

References 13 publications

(17 reference statements)

Supporting

Mentioning

116

Contrasting

Order By: Relevance

“…In an award winning paper [38] revisited the problem of training linear SVMs when the data does not fit into memory [29,27,19]. In a nutshell, the key idea is to split the data into manageable blocks, compress and store each block on disk, and perform dual coordinate descent by loading each block sequentially.…”

Section: Solvers For Training Svmsmentioning

confidence: 99%

“…In a nutshell, the key idea is to split the data into manageable blocks, compress and store each block on disk, and perform dual coordinate descent by loading each block sequentially. This basic idea was improved upon by [11] who observed that the block minimization (BM) algorithm of [38] does not retain important points before discarding each block. They therefore, propose to retain some important points from the previous blocks in the RAM.…”

Section: Solvers For Training Svmsmentioning

confidence: 99%

See 1 more Smart Citation

Linear support vector machines via dual cached loops

Matsushima

Vishwanathan

Smola

2012

Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

View full text Add to dashboard Cite

Modern computer hardware offers an elaborate hierarchy of storage subsystems with different speeds, capacities, and costs associated with them. Furthermore, processors are now inherently parallel offering the execution of several diverse threads simultaneously. This paper proposes StreamSVM, the first algorithm for training linear Support Vector Machines (SVMs) which takes advantage of these properties by integrating caching with optimization. StreamSVM works by performing updates in the dual, thus obviating the need to rebalance frequently visited examples. Furthermore we trade off file I/O with data expansion on the fly by generating features on demand. This significantly increases throughput. Experiments show that StreamSVM outperforms other linear SVM solvers, including the award winning work of [38], by orders of magnitude and produces more accurate solutions within a shorter amount of time.

show abstract

Section: Solvers For Training Svmsmentioning

confidence: 99%

Section: Solvers For Training Svmsmentioning

confidence: 99%

Linear support vector machines via dual cached loops

Matsushima

Vishwanathan

Smola

2012

Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

View full text Add to dashboard Cite

show abstract

“…Linear classification models have been shown to handle large amounts of data well, and several optimization techniques (e.g., [3,4,5]) have been applied to efficiently train linear models. However, when the data cannot fit into memory, batch learners, which load the entire data during the training process, suffer severely due to disk swapping [1]. In these cases, training techniques that deal well with memory limitations become crucial.…”

Section: Introductionmentioning

confidence: 99%

“…7 shows an example where the online learner MIRA [9] takes more than one hour loading data but spends less than one minute updating the model. As discussed in [1], training time consists of (1) the time used to update the model's parameters given the data in memory and (2) the time needed to load data from disk. The machine learning literature focuses on the first issue and neglects the second.…”

Section: Introductionmentioning

confidence: 99%

Selective block minimization for faster convergence of limited memory large-scale linear models

Chang

Roth

2011

Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Self Cite

View full text Add to dashboard Cite

As the size of data sets used to build classifiers steadily increases, training a linear model efficiently with limited memory becomes essential. Several techniques deal with this problem by loading blocks of data from disk one at a time, but usually take a considerable number of iterations to converge to a reasonable model. Even the best block minimization techniques [1] require many block loads since they treat all training examples uniformly. As disk I/O is expensive, reducing the amount of disk access can dramatically decrease the training time.This paper introduces a selective block minimization (SBM) algorithm, a block minimization method that makes use of selective sampling. At each step, SBM updates the model using data consisting of two parts: (1) new data loaded from disk and (2) a set of informative samples already in memory from previous steps. We prove that, by updating the linear model in the dual form, the proposed method fully utilizes the data in memory and converges to a globally optimal solution on the entire data. Experiments show that the SBM algorithm dramatically reduces the number of blocks loaded from disk and consequently obtains an accurate and stable model quickly on both binary and multi-class classification.

show abstract

“…Nowadays, regarding the amount of data in our world has been exploding, and also the data is usually stored in a distributed environment, these conventional linear classification algorithms, which run on a single computer, become infeasible and impracticable for directly handling large-scale datasets in real practice. Therefore, distributed classification algorithms have been developed for solving the large-scale classification problem [8], [9].…”

mentioning

confidence: 99%

Group-Based Alternating Direction Method of Multipliers for Distributed Linear Classification

Wang

Gao

Shi

et al. 2017

IEEE Trans. Cybern.

View full text Add to dashboard Cite

The alternating direction method of multipliers (ADMM) algorithm has been widely employed for distributed machine learning tasks. However, it suffers from several limitations, e.g., a relative low convergence speed, and an expensive time cost. To this end, in this paper, a novel method, namely the group-based ADMM (GADMM), is proposed for distributed linear classification. In particular, to accelerate the convergence speed and improve global consensus, a group layer is first utilized in GADMM to divide all the slave nodes into several groups. Then, all the local variables (from the slave nodes) are gathered in the group layer to generate different group variables. Finally, by using a weighted average method, the group variables are coordinated to update the global variable (from the master node) until the solution of the global problem is reached. According to the theoretical analysis, we found that: 1) GADMM can mathematically converge at the rate , where is the number of outer iterations and 2) by using the grouping methods, GADMM can improve the convergence speed compared with the distributed ADMM framework without grouping methods. Moreover, we systematically evaluate GADMM on four publicly available LIBSVM datasets. Compared with disADMM and stochastic dual coordinate ascent with alternating direction method of multipliers-ADMM, for distributed classification, GADMM is able to reduce the number of outer iterations, which leads to faster convergence speed and better global consensus. In particular, the statistical significance test has been experimentally conducted and the results validate that GADMM can significantly save up to 30% of the total time cost (with less than 0.6% accuracy loss) compared with disADMM on large-scale datasets, e.g., webspam and epsilon.

show abstract

Large linear classification when data cannot fit in memory

Cited by 72 publications

References 13 publications

Linear support vector machines via dual cached loops

Linear support vector machines via dual cached loops

Selective block minimization for faster convergence of limited memory large-scale linear models

Group-Based Alternating Direction Method of Multipliers for Distributed Linear Classification

Contact Info

Product

Resources

About