2015 IEEE International Conference on Big Data (Big Data) 2015
DOI: 10.1109/bigdata.2015.7363760
|View full text |Cite
|
Sign up to set email alerts
|

Machine learning at the limit

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

1
23
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 22 publications
(24 citation statements)
references
References 4 publications
1
23
0
Order By: Relevance
“…In the case of word2vec, however, not all word vectors are updated at the same frequency as those are proportional to the word unigram frequencies, e.g., the vectors in the model associated with popular words are updated more frequently than those of rare words. We therefore strive to match model update frequency to word frequency, and a sub-model (instead of full-model) synchronization scheme, similar to the one exploited in BIDMach [10], is used.…”
Section: E Distributed Memory Parallelizationmentioning
confidence: 99%
See 3 more Smart Citations
“…In the case of word2vec, however, not all word vectors are updated at the same frequency as those are proportional to the word unigram frequencies, e.g., the vectors in the model associated with popular words are updated more frequently than those of rare words. We therefore strive to match model update frequency to word frequency, and a sub-model (instead of full-model) synchronization scheme, similar to the one exploited in BIDMach [10], is used.…”
Section: E Distributed Memory Parallelizationmentioning
confidence: 99%
“…For the purpose of comparison, we also include in Fig. 4 BIDMach's performances on N = 1, 4 NVidia Titan-X GPUs provided by [10], which reports the state-of-the-art performance achieved on multi-GPU systems. Again, good scalability is only meaningful when similar or better accuracy is achieved.…”
Section: Distributed Multi-node Systemsmentioning
confidence: 99%
See 2 more Smart Citations
“…[20] further accelerates CCD++ on GPUs using loop fusion and tiling. The resulting algorithm is shown to be faster than CCD++ on CPUs [36] as [22]; multi-nodes: FactorBird [30], Petuum [5] blocking: workers pick non-overlapping blocks blockDim=#workers: DSGD [9] blockDim>#workers: LIBMF [39], NOMAD [37], DSGD++ [32] nested blocking: dcMF [21], MLGF-MF [27] single and multiple GPUs: GPU-SGD -SGD with lockfree and blocking [35] ALS replicate all features: PALS [38], DALS [32] partial replicate features: SparkALS [18], GraphLab [17], Sparkler [16] rotate features: Facebook [13] approximate ALS: [29] single GPU: BIDMach [2], HPC-ALS [8] single and multiple GPUs: GPU-ALS [31] and CUM-FALS CCD multi-core and multi node: CCD++ [36] single GPU: parallel CCD++ [20] well as GPU-ALS [31] that is without memory optimization and approximate computing.…”
Section: A Parallel Sgdmentioning
confidence: 99%