We d .esc . ribe a general formalism for training neural pred1ctIve systems. We then introduce discrimination at the frame level and show how it relates to maximum mutual information training. Last, we propose an approach for performing discri�natiol1 in predictive systems at the sequence level , 1t makes use of N-Best sequence selection. Performances: for acoustic-phonetic decoding reach 77.4% phone accuracy on 1988 version of TIMIT.