2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS) 2022
DOI: 10.1109/ipdps53621.2022.00109
|View full text |Cite
|
Sign up to set email alerts
|

Why Globally Re-shuffle? Revisiting Data Shuffling in Large Scale Deep Learning

Abstract: Stochastic gradient descent (SGD) is the most prevalent algorithm for training Deep Neural Networks (DNN). SGD iterates the input data set in each training epoch processing data samples in a random access fashion. Because this puts enormous pressure on the I/O subsystem, the most common approach to distributed SGD in HPC environments is to replicate the entire dataset to node local SSDs. However, due to rapidly growing data set sizes this approach has become increasingly infeasible. Surprisingly, the questions… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
6
4

Relationship

0
10

Authors

Journals

citations
Cited by 15 publications
(6 citation statements)
references
References 30 publications
0
6
0
Order By: Relevance
“…Many countermeasures, including masking [13][14][15], shuffling [16][17][18], randomized clock [19,20], random delay insertion [21][22][23], constant-weight encoding [24], and code polymorphism [25,26], are used to lessen side-channel assaults. By preventing information from leaking through physically quantifiable channels like time [27,28], power consump-tion [29,30], or electromagnetic radiation [31,32], these countermeasures seek to safeguard cryptographic systems.…”
Section: Of 13mentioning
confidence: 99%
“…Many countermeasures, including masking [13][14][15], shuffling [16][17][18], randomized clock [19,20], random delay insertion [21][22][23], constant-weight encoding [24], and code polymorphism [25,26], are used to lessen side-channel assaults. By preventing information from leaking through physically quantifiable channels like time [27,28], power consump-tion [29,30], or electromagnetic radiation [31,32], these countermeasures seek to safeguard cryptographic systems.…”
Section: Of 13mentioning
confidence: 99%
“…The dataset consists of images displaying various monuments, organized in a directory format where each subdirectory represents a different monument class or category. We utilized TensorFlow's tf.keras.preprocessing.image_dataset_from_directory utility to load the dataset, applying specifications for shuffling [8] , resizing [9], and batching. This process ensures that the dataset is formatted appropriately and ready for subsequent analysis.…”
Section: ) Data Collection and Preparationmentioning
confidence: 99%
“…This means randomly changing the locations for several samples but keeping the feature values in the same order. Shuffling is essential to eliminate any sort order in the dataset, ensuring the classifier is not overfitting to particular class duo sort order [ 45 ]. Dataset Distribution: This process involves dividing the dataset into training and testing (validation) datasets.…”
Section: Doh Identification Architecturementioning
confidence: 99%