2021
DOI: 10.48550/arxiv.2106.11872
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Randomness In Neural Network Training: Characterizing The Impact of Tooling

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
13
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(13 citation statements)
references
References 36 publications
0
13
0
Order By: Relevance
“…Generally, machine learning experiments are not precisely predictable -complex models trained on complex data typically yield noisy or variable results [79,17]. 8 Though individual experiments may be unpredictable, the general performance of large generative models tends to exhibit smooth and predictable growth as a function of scale -larger systems tend to do increasingly better on a broad range of tasks.…”
Section: Smooth General Capability Scalingmentioning
confidence: 99%
“…Generally, machine learning experiments are not precisely predictable -complex models trained on complex data typically yield noisy or variable results [79,17]. 8 Though individual experiments may be unpredictable, the general performance of large generative models tends to exhibit smooth and predictable growth as a function of scale -larger systems tend to do increasingly better on a broad range of tasks.…”
Section: Smooth General Capability Scalingmentioning
confidence: 99%
“…Reproducibility: Many factors contribute to irreproducibility in deep models [13,18,19,44,48,49,56]. The highly non-convex objective [18], combined with nondterminism in training [49] and underspecificaiton [13] of over-parameterized deep networks, can lead training models to optima at different locations in a manifold or sets of optima.…”
Section: Related Work and Productionalizationmentioning
confidence: 99%
“…The highly non-convex objective [18], combined with nondterminism in training [49] and underspecificaiton [13] of over-parameterized deep networks, can lead training models to optima at different locations in a manifold or sets of optima. Nondeterminism can emerge from the highly parallelized, highly distributed training pipelines, quantization errors, hardware types [56] and more. Slight deviations early in training due to these can lead to very different models [1]).…”
Section: Related Work and Productionalizationmentioning
confidence: 99%
“…In most experiments, there is inherent randomness in the scores obtained from different runs. This randomness can arise from stochasticity in the task, exploratory choices made during learning, randomized initial parameters, but also software and hardware considerations such as non-determinism in GPUs and in machine learning frameworks [113]. Thus, we model the algorithm's normalized score on the m th task as a real-valued random variable X m .…”
Section: Formalismmentioning
confidence: 99%