Recent research progresses in speech recognition, text-to-speech, natural language understanding, or dialog management components are improving the way humans interact with advanced robot machines. However, far from being solved, we are just starting the process of creating meaningful multimodal platforms that can allow operators to use and control industrial robots through spoken dialogue.This paper describes our ongoing efforts on creating a modular platform that combines different technologies to cover typical requirements in an industrial setting, i.e. robust speech recognition, low level skill functions to operate the robot, recommendations and validation procedures to setup parameters, combination of audio-visual information for challenging environments, integration of domain-knowledge by means of an ontology, a flexible definition of the dialog model and natural language rules, as well as a test and control interface to quickly check the functionality of each module during development and operation. All platform modules are intercommunicated by the ROS operative system which allows the integration of external plugins and modules easily.Finally, a preliminary user study with IT experts simulating a welding task has been doing giving us clues on what should be the focus of our next developments.