Proceedings of BigScience Episode #5 -- Workshop on Challenges &Amp; Perspectives in Creating Large Language Models 2022
DOI: 10.18653/v1/2022.bigscience-1.11
|View full text |Cite
|
Sign up to set email alerts
|

Emergent Structures and Training Dynamics in Large Language Models

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
15
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(15 citation statements)
references
References 0 publications
0
15
0
Order By: Relevance
“…After fitting equation (9.1), Hoffmann et al [273] find scriptLtrueprefixminfalse(N,Dfalse)=406.4N0.34+410.7D0.28+1.69.If we then plug in N and D for a selection of real foundation models we arrive at figure 26. We can see in figure 26 that the model size term for real foundation models is far lower than the dataset size term.…”
Section: Foundation Models: a Fourth Astroconnectionist Wave?mentioning
confidence: 99%
See 2 more Smart Citations
“…After fitting equation (9.1), Hoffmann et al [273] find scriptLtrueprefixminfalse(N,Dfalse)=406.4N0.34+410.7D0.28+1.69.If we then plug in N and D for a selection of real foundation models we arrive at figure 26. We can see in figure 26 that the model size term for real foundation models is far lower than the dataset size term.…”
Section: Foundation Models: a Fourth Astroconnectionist Wave?mentioning
confidence: 99%
“…The table above shows the number of parameters in a model ( N ), the number of tokens within that model’s training set ( D ), and their corresponding calculated emergent terms from equation (9.1). Here we use Hoffmann et al [273] to source values for A , α , B and β . The minimum loss for each model accordingto Hoffmann et al [273] is shown as scriptLtrueprefixmin.…”
Section: Foundation Models: a Fourth Astroconnectionist Wave?mentioning
confidence: 99%
See 1 more Smart Citation
“…A more dangerous scenario is if the model chooses a prediction to manipulate the outcome of that prediction (or other predictions), 32 e.g. if the model directly tries to find a self-consistent prediction by solving for a fixed point where the world is consistent with the model's own prediction [Tre22]. 33 Even if the model is myopically maximizing predictive accuracy, it has an incentive to find a fixed point that is both stable and likely-that is, a situation where the world state that results from the model outputting its prediction is highly overdetermined-since that's what makes for the best individual prediction.…”
Section: Major Challenge: Self-fulfilling Propheciesmentioning
confidence: 99%
“…As our state-of-the-art learning methods approach asymptotically-optimal performance for static training sets and benchmarks, the primary lever for improving our models lies in improving the training data itself. Several results support this data-centric view of ML: for large neural models, continually scaling up the quantity of training data empirically improves generalization performance [17,56,136]. Generating additional supervised samples through methods like data programming —which generates new labels based on human-provided, approximate labelling functions, results in significant performance gains in several domains [137,138].…”
Section: A Unified View Of Explorationmentioning
confidence: 99%