Solving Non-linear Kolmogorov Equations in Large Dimensions by Using Deep Learning: A Numerical Comparison of Discretization Schemes

Macris, Nicolas; Marino, Raffaele

doi:10.1007/s10915-022-02044-x

Cited by 4 publications

(4 citation statements)

References 49 publications

(145 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A deep neural network is a type of ML model, and when a deep network is fitted to data, this is referred to as deep learning [31]. Deep learning (DL) has shown very powerful empirical performance for solving very complex real-world problems in areas such as computer vision [32], natural language processing [33,34], speech recognition [35], recommendation systems [36], drug discovery [37], differential equations [38,39], and much more [40][41][42]. In simple words, DL can be seen a neural network [43], composed by many layers, that takes some data set D, input and targets, and learns the rules for forecasting new input data.…”

Section: Introductionmentioning

confidence: 99%

Phase transitions in the mini-batch size for sparse and dense two-layer neural networks

Marino,

Ricci-Tersenghi

2024

Mach. Learn.: Sci. Technol.

Self Cite

View full text Add to dashboard Cite

The use of mini-batches of data in training artificial neural networks is nowadays very common. Despite its broad usage, theories explaining quantitatively how large or small the optimal mini-batch size should be are missing. This work presents a systematic attempt at understanding the role of the mini-batch size in training two-layer neural networks.Working in the teacher-student scenario, with a sparse teacher, and focusing on tasks of different complexity, we quantify the effects of changing the mini-batch size $m$.We find that often the generalization performances of the student strongly depend on $m$ and may undergo sharp phase transitions at a critical value $m_c$, such that for $m<m_c$ the training process fails, while for $m>m_c$ the student learns perfectly or generalizes very well the teacher. Phase transitions are induced by collective phenomena firstly discovered in statistical mechanics and later observed in many fields of science. Observing a phase transition by varying the mini-batch size across different architectures raises several questions about the role of this hyperparameter in the neural network learning process.

show abstract

Section: Introductionmentioning

confidence: 99%

Phase transitions in the mini-batch size for sparse and dense two-layer neural networks

Marino,

Ricci-Tersenghi

2024

Mach. Learn.: Sci. Technol.

Self Cite

View full text Add to dashboard Cite

show abstract

“…The MIS is important for applications in computer science, operations research, and engineering via such uses as graph coloring, assigning channels to the radio stations, register allocation in a compiler, artificial intelligence etc. [5][6][7][8].…”

Section: Introductionmentioning

confidence: 99%

Large Independent Sets on Random d-Regular Graphs with Fixed Degree d

Marino,

Kirkpatrick

2023

Computation

Self Cite

View full text Add to dashboard Cite

The maximum independent set problem is a classic and fundamental combinatorial challenge, where the objective is to find the largest subset of vertices in a graph such that no two vertices are adjacent. In this paper, we introduce a novel linear prioritized local algorithm tailored to address this problem on random d-regular graphs with a small and fixed degree d. Through exhaustive numerical simulations, we empirically investigated the independence ratio, i.e., the ratio between the cardinality of the independent set found and the order of the graph, which was achieved by our algorithm across random d-regular graphs with degree d ranging from 5 to 100. Remarkably, for every d within this range, our results surpassed the existing lower bounds determined by theoretical methods. Consequently, our findings suggest new conjectured lower bounds for the MIS problem on such graph structures. This finding has been obtained using a prioritized local algorithm. This algorithm is termed ‘prioritized’ because it strategically assigns priority in vertex selection, thereby iteratively adding them to the independent set.

show abstract

“…If one wants to use the MC method for computing the global minimum, one can either run the algorithm at T = 0 or change slowly the temperature from an initial value to T = 0 : this is the so-called Simulated Annealing algorithm 14 . A key property that allows for a solid theory of MC is the so-called detailed balance condition that ensures the algorithm admits a limiting distribution at large times 15,16 .SGD [17][18][19] is a popular optimization algorithm used in the development of state-of-the-art machine learning 20 and deep learning models 21,22 , which have shown tremendous success in numerous fields, becoming indispensable tools for many advanced applications [23][24][25][26][27][28][29][30][31] . It is an extension of the gradient descent algorithm 32 that uses a subset of the training data to compute the gradient of the objective function at each iteration.…”

mentioning

confidence: 99%

“…SGD [17][18][19] is a popular optimization algorithm used in the development of state-of-the-art machine learning 20 and deep learning models 21,22 , which have shown tremendous success in numerous fields, becoming indispensable tools for many advanced applications [23][24][25][26][27][28][29][30][31] . It is an extension of the gradient descent algorithm 32 that uses a subset of the training data to compute the gradient of the objective function at each iteration.…”

mentioning

confidence: 99%

Stochastic Gradient Descent-like relaxation is equivalent to Metropolis dynamics in discrete optimization and inference problems

Angelini,

Cavaliere,

Marino

et al. 2024

Sci Rep

Self Cite

View full text Add to dashboard Cite

Is Stochastic Gradient Descent (SGD) substantially different from Metropolis Monte Carlo dynamics? This is a fundamental question at the time of understanding the most used training algorithm in the field of Machine Learning, but it received no answer until now. Here we show that in discrete optimization and inference problems, the dynamics of an SGD-like algorithm resemble very closely that of Metropolis Monte Carlo with a properly chosen temperature, which depends on the mini-batch size. This quantitative matching holds both at equilibrium and in the out-of-equilibrium regime, despite the two algorithms having fundamental differences (e.g. SGD does not satisfy detailed balance). Such equivalence allows us to use results about performances and limits of Monte Carlo algorithms to optimize the mini-batch size in the SGD-like algorithm and make it efficient at recovering the signal in hard inference problems.

show abstract

Solving Non-linear Kolmogorov Equations in Large Dimensions by Using Deep Learning: A Numerical Comparison of Discretization Schemes

Cited by 4 publications

References 49 publications

Phase transitions in the mini-batch size for sparse and dense two-layer neural networks

Phase transitions in the mini-batch size for sparse and dense two-layer neural networks

Large Independent Sets on Random d-Regular Graphs with Fixed Degree d

Stochastic Gradient Descent-like relaxation is equivalent to Metropolis dynamics in discrete optimization and inference problems

Contact Info

Product

Resources

About