By using automatic speech recognition (ASR) and text to speech (TTS) systems, which have been available in Mongolian for last few years, this research set out to implement a new version of the Mongolian Virtual Education Environment (VEE) that has not included a speech interface. The spoken language system aims to provide a natural interface between trainees and the environment by using simple and natural dialogues to enable the user to access the multimedia knowledge base of the VEE. We have worked on the response generation part of the system. This paper describes a TTS system for the VEE for university courses held in Mongolian. A concatenative speech synthesizer for Mongolian is applied for the TTS in response generation. A Festvox framework for unit selection speech synthesis was used to build the Mongolian voice. We discuss aspects of the voice development process and the results of a perceptual test of the synthesized voice.