“…To do this, we train the first layer on raw inputs to obtain parameters W (1,1) , W (1,2) , b (1,1) , b (1,2) , then use the first layer to transform the raw input into a vector consisting of activation of the hidden units. The second layer is trained on this vector to obtain parameters W (2,1) , W (2,2) , b (2,1) , b (2,2) .…”