“…In this method, three MLNs instead of two MLNs [5] MLN LF-DPF , outputs DPFs [11,12] for the inputted acoustic features, LFs [15], while the second MLN, MLN cntxt , reduces misclassification at phoneme boundaries by taking seven frame context (from t-3 to t+3) as input, and the third MLN, MLN Dyn , restricts the DPF dynamics by incorporating dynamic parameters ( DPF and DPF) into its input. Here, the MLN LF-DPF , which is trained using the standard back-propagation learning algorithm, has two hidden layers of 256 and 96 units, respectively and takes three input vectors (t-3, t, t+3) of LFs of 25 dimensions each.…”