Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 2010
DOI: 10.1145/1835804.1835910
|View full text |Cite
|
Sign up to set email alerts
|

Large linear classification when data cannot fit in memory

Abstract: Recent advances in linear classification have shown that for applications such as document classification, the training process can be extremely efficient. However, most of the existing training methods are designed by assuming that data can be stored in the computer memory. These methods cannot be easily applied to data larger than the memory capacity due to the random access to the disk. We propose and analyze a block minimization framework for data larger than the memory size. At each step a block of data i… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

0
116
0

Year Published

2011
2011
2019
2019

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 72 publications
(116 citation statements)
references
References 13 publications
(17 reference statements)
0
116
0
Order By: Relevance
“…In an award winning paper [38] revisited the problem of training linear SVMs when the data does not fit into memory [29,27,19]. In a nutshell, the key idea is to split the data into manageable blocks, compress and store each block on disk, and perform dual coordinate descent by loading each block sequentially.…”
Section: Solvers For Training Svmsmentioning
confidence: 99%
See 1 more Smart Citation
“…In an award winning paper [38] revisited the problem of training linear SVMs when the data does not fit into memory [29,27,19]. In a nutshell, the key idea is to split the data into manageable blocks, compress and store each block on disk, and perform dual coordinate descent by loading each block sequentially.…”
Section: Solvers For Training Svmsmentioning
confidence: 99%
“…In a nutshell, the key idea is to split the data into manageable blocks, compress and store each block on disk, and perform dual coordinate descent by loading each block sequentially. This basic idea was improved upon by [11] who observed that the block minimization (BM) algorithm of [38] does not retain important points before discarding each block. They therefore, propose to retain some important points from the previous blocks in the RAM.…”
Section: Solvers For Training Svmsmentioning
confidence: 99%
“…Linear classification models have been shown to handle large amounts of data well, and several optimization techniques (e.g., [3,4,5]) have been applied to efficiently train linear models. However, when the data cannot fit into memory, batch learners, which load the entire data during the training process, suffer severely due to disk swapping [1]. In these cases, training techniques that deal well with memory limitations become crucial.…”
Section: Introductionmentioning
confidence: 99%
“…7 shows an example where the online learner MIRA [9] takes more than one hour loading data but spends less than one minute updating the model. As discussed in [1], training time consists of (1) the time used to update the model's parameters given the data in memory and (2) the time needed to load data from disk. The machine learning literature focuses on the first issue and neglects the second.…”
Section: Introductionmentioning
confidence: 99%
“…Nowadays, regarding the amount of data in our world has been exploding, and also the data is usually stored in a distributed environment, these conventional linear classification algorithms, which run on a single computer, become infeasible and impracticable for directly handling large-scale datasets in real practice. Therefore, distributed classification algorithms have been developed for solving the large-scale classification problem [8], [9].…”
mentioning
confidence: 99%