Articulation training with many kinds of stimulus and messages such as visual, voice, and articulatory information can teach user to pronounce correctly and improve user's articulatory ability. In this paper, an articulation training system with intelligent interface and multimode feedbacks is proposed to improve the performance of articulation training. Dependent network is designed to model clinical knowledge of speech-language pathologists used in speech evaluation Automatic speech recognition with dependent network is then apply to identify the pronunciation errors. Besides, hierarchical Bayesian network is proposed to recognize user's emotion from speeches. With the information of pronunciation errors and user's emotion, the articulation training sentences can be dynamically selected. Finally, a 3D facial animation is provided to teach users to pronounce a sentence by using speech, lip motion, and tongue motion. Experimental results reveal the usefulness of proposed method and system.