Developing interactive behaviors for social robots presents a number of challenges. It is difficult to interpret the meaning of the details of people's behavior, particularly non-verbal behavior like body positioning, but yet a social robot needs to be contingent to such subtle behaviors. It needs to generate utterances and non-verbal behavior with good timing and coordination. The rules for such behavior are often based on implicit knowledge and thus difficult for a designer to describe or program explicitly. We propose to teach such behaviors to a robot with a learning-by-demonstration approach, using recorded human-human interaction data to identify both the behaviors the robot should perform and the social cues it should respond to. In this study, we present a fully unsupervised approach that uses abstraction and clustering to identify behavior elements and joint interaction states, which are used in a variable-length Markov model predictor to generate socially-appropriate behavior commands for a robot. The proposed technique provides encouraging results despite high amounts of sensor noise, especially in speech recognition. We demonstrate our system with a robot in a shopping scenario.