In the following, we introduce the main aspects behind the ML techniques used throughout this thesis. Specifically, we first introduce neural networks, which are one of the most popular examples of ML algorithms for classification and regression problems, and then introduce the main ideas behind the concept of dimensionality reduction. Finally, we define the correlation where j = 1, . . . , M , and the superscript (1) indicates that the corresponding parameters are in the first layer of the network. Each quantity a j is then transformed using a differentiable, nonlinear activation function h to give z j = h(a j ).(1.4)These quantities correspond to the outputs of the units in the hidden layer. Common choices for the activation function h are sigmoidal functions such as the logistic sigmoid(1.5)where > 0 is a small positive number called learning rate. After each update, the gradient is re-evaluated for the new vector of parameters and the process is repeated until convergence. Note that the error function is defined with respect to a training set, meaning that at each step the entire training set has to be processed in order to evaluate ∇E. Techniques that use
Dimensionality reductionDimensionality reduction is a typical unsupervised learning task, which is particularly useful when dealing with high-dimensional data. The goal of dimensionality reduction is to find a lower-dimensional representation of the original data while trying to preserve the most relevant information -or, equivalently, while trying to minimize the loss of information. Effectively, dimensionality reduction techniques allow us to extract the most relevant information in a given dataset by removing redundancy, which is present whenever two or more variables are strongly wherex andȳ are the means of the corresponding variables. By definition, the quantity in Eq. 1.19 is always between −1 and 1, with |r xy | = 1 indicating that the two variables are perfectly related by a linear transformation, and r xy = 0 indicating that the two variables are not at all linearly related. For instance, the two variables x 1 and x 2 in Fig. 1.2 have a Pearson correlation coefficient of r x 1 x 2 = 0.95, meaning that they are almost perfectly linearly related.Another common measure of the (possibly nonlinear) correlation between two variables is the Spearman's rank correlation coefficient, which quantifies how strongly the relationship between the two variables can be described by a monotonic function (whether linear or not). Specifically, the Spearman's rank correlation between two variables is defined as the Pearson correlation between the rank values of those two variables.