Progress measures for grokking via mechanistic interpretability

Nanda, Neel; Chan, Lawrence; Tom, Lieberum,; Smith, James O.; Steinhardt, Jacob

doi:10.48550/arxiv.2301.05217

Cited by 14 publications

(17 citation statements)

References 12 publications

(22 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This implies that there are certain prompts that can modify the processing in unexpected ways based on the procedure of how the AI is trained. This is still poorly understood since to date there is yet no clear understanding how these emergent properties awaken from the mathematical operations within the artificial neural networks, which is currently the object of research in a discipline called Mechanistic Interpretability (Conmy et al, 2023;Nanda et al, 2023;Zimmermann et al, 2023).…”

Section: Totmentioning

confidence: 99%

Embracing the future of Artificial Intelligence in the classroom: the relevance of AI literacy, prompt engineering, and critical thinking in modern education

Walter

2024

Int J Educ Technol High Educ

View full text Add to dashboard Cite

The present discussion examines the transformative impact of Artificial Intelligence (AI) in educational settings, focusing on the necessity for AI literacy, prompt engineering proficiency, and enhanced critical thinking skills. The introduction of AI into education marks a significant departure from conventional teaching methods, offering personalized learning and support for diverse educational requirements, including students with special needs. However, this integration presents challenges, including the need for comprehensive educator training and curriculum adaptation to align with societal structures. AI literacy is identified as crucial, encompassing an understanding of AI technologies and their broader societal impacts. Prompt engineering is highlighted as a key skill for eliciting specific responses from AI systems, thereby enriching educational experiences and promoting critical thinking. There is detailed analysis of strategies for embedding these skills within educational curricula and pedagogical practices. This is discussed through a case-study based on a Swiss university and a narrative literature review, followed by practical suggestions of how to implement AI in the classroom.

show abstract

Section: Totmentioning

confidence: 99%

Embracing the future of Artificial Intelligence in the classroom: the relevance of AI literacy, prompt engineering, and critical thinking in modern education

Walter

2024

Int J Educ Technol High Educ

View full text Add to dashboard Cite

show abstract

“…Relationship with Circuits A common theme in mechanistic interpretability, especially when it comes to explaining the grokking phenomenon, is the idea of 'circuit' formation during training (Nanda et al, 2023;Varma et al, 2023;Olah et al, 2020) of the network in a region-wise fashion, i.e., for all input vectors {x : x ∈ ω}, the network performs the same affine operation using parameters (A ω , b ω ) while mapping x to the output. The affine parameters for any given region, are a function of the active neurons in the network as was shown by Humayun et al (2023a) (Lemma 1).…”

Section: Measuring Local Complexity Using the Deepmentioning

confidence: 99%

“…Our novel measure does not rely on the dataset, labels, or loss function that is used during training. It behaves as a progress measure (Barak et al, 2022;Nanda et al, 2023) We summarize the contributions as follows:…”

Section: Introductionmentioning

confidence: 99%

No More Than 6ft Apart: Robust K-Means via Radius Upper Bounds

Humayun

Balestriero²,

Kyrillidis

et al. 2022

ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Grokking, or delayed generalization, is a phenomenon where generalization in a deep neural network (DNN) occurs long after achieving near zero training error. Previous studies have reported the occurrence of grokking in specific controlled settings, such as DNNs initialized with largenorm parameters or transformers trained on algorithmic datasets. We demonstrate that grokking is actually much more widespread and materializes in a wide range of practical settings, such as training of a convolutional neural network (CNN) on CIFAR10 or a Resnet on Imagenette. We introduce the new concept of delayed robustness, whereby a DNN groks adversarial examples and becomes robust, long after interpolation and/or generalization. We develop an analytical explanation for the emergence of both delayed generalization and delayed robustness based on a new measure of the local complexity of a DNN's inputoutput mapping. Our local complexity measures the density of the so-called "linear regions" (aka, spline partition regions) that tile the DNN input space, and serves as a utile progress measure for training. We provide the first evidence that for classification problems, the linear regions undergo a phase transition during training whereafter they migrate away from the training samples (making the DNN mapping smoother there) and towards the decision boundary (making the DNN mapping less smooth there). Grokking occurs post phase transition as a robust partition of the input space emerges thanks to the linearization of the DNN mapping around the training points. bit.ly/grokadversarial.

show abstract

“…These kinds of models are hard to interpret due to their complexity, irrespective of the soundness of the statistical foundations on which they are built. For instance, proving that a neural network is a universal function approximator (Hornik et al 1989) is scant consolation for the fact that humans can only make sense of the inner workings of a trained neural network model through laborious analysis that resembles experimental biology more than mathematics (this reverse engineering work constitutes the newborn field of "mechanistic interpretability;" see, e.g., Olah et al 2017;Carter et al 2019;Nanda et al 2023). Second, the model would have to be trained on simulations.…”

Section: Introductionmentioning

confidence: 99%

Interpretable Machine Learning for Finding Intermediate-mass Black Holes

Pasquato,

Trevisan,

Askar

et al. 2024

ApJ

View full text Add to dashboard Cite

Definitive evidence that globular clusters (GCs) host intermediate-mass black holes (IMBHs) is elusive. Machine-learning (ML) models trained on GC simulations can in principle predict IMBH host candidates based on observable features. This approach has two limitations: first, an accurate ML model is expected to be a black box due to complexity; second, despite our efforts to simulate GCs realistically, the simulation physics or initial conditions may fail to reflect reality fully. Therefore our training data may be biased, leading to a failure in generalization to observational data. Both the first issue—explainability/interpretability—and the second—out of distribution generalization and fairness—are active areas of research in ML. Here we employ techniques from these fields to address them: we use the anchors method to explain an Extreme Gradient Boosting (XGBoost) classifier; we also independently train a natively interpretable model using Certifiably Optimal RulE ListS (CORELS). The resulting model has a clear physical meaning, but loses some performance with respect to XGBoost. We evaluate potential candidates in real data based not only on classifier predictions but also on their similarity to the training data, measured by the likelihood of a kernel density estimation model. This measures the realism of our simulated data and mitigates the risk that our models may produce biased predictions by working in extrapolation. We apply our classifiers to real GCs, obtaining a predicted classification, a measure of the confidence of the prediction, an out-of-distribution flag, a local rule explaining the prediction of XGBoost, and a global rule from CORELS.

show abstract

Progress measures for grokking via mechanistic interpretability

Cited by 14 publications

References 12 publications

Embracing the future of Artificial Intelligence in the classroom: the relevance of AI literacy, prompt engineering, and critical thinking in modern education

Embracing the future of Artificial Intelligence in the classroom: the relevance of AI literacy, prompt engineering, and critical thinking in modern education

No More Than 6ft Apart: Robust K-Means via Radius Upper Bounds

Interpretable Machine Learning for Finding Intermediate-mass Black Holes

Contact Info

Product

Resources

About