In this chapter, we will consider the language support of VoiceXML 2.1 to express flexible dialogs in pervasive environments. Missing information about the environment and the inability to react to external events lead to rigid and verbose dialogs. But building upon the recently defined W3C MMI architecture we present an approach where dialog authors can adapt their dialogs' behavior with regard to the users' surroundings and incorporate available information and devices from the pervasive environment. Adding these features extends the expressiveness of VoiceXML 2.1, and allows for an integration into a multimodal and mobile interaction as anticipated in the, as of now, dormant VoiceXML 3.0 standard.