“…The second approach is to improve the structure of the model. Such as improving performance through nested models that combine multiple "good enough" models to achieve excellent predictive power, or replacing simple neuron units with complex LSTM neurons, for example, using LSTM models to exploit the advantages of grammar analysis [3][4]. The third is by adjusting the parameters of performance improvement, such as the initialization [5] of an improved model, to ensure that the early gradient has a large number of sparse, or take advantage of the principle of linear algebra [6], to initialize the learning rate, the size of batch size, regularization coefficient, dropout coefficient.…”