Humans have an amazing ability to bootstrap new knowledge. The concept of structural bootstrapping refers to mechanisms relying on prior knowledge, sensorimotor experience, and inference that can be implemented in robotic systems and employed to speed up learning and problem solving in new environments. In this context, the interplay between the symbolic encoding of the sensorimotor information, prior knowledge, planning, and natural language understanding plays a significant role. In this paper, we show how the symbolic descriptions of the world can be generated on the fly from the continuous robot's memory. We also introduce a multi-purpose natural language understanding framework that processes human spoken utterances and generates planner goals as well as symbolic descriptions of the world and human actions. Both components were tested on the humanoid robot ARMAR-III in a scenario requiring planning and plan recognition based on human-robot communication.