Deciphering how different types of behavior and ultrasonic vocalizations (USVs) of rats interact can yield insights into the neural basis of social interaction. However, the behavior-vocalization interplay of rats remains elusive because of the challenges of relating the two communication media in complex social contexts. Here, we propose a machine learning-based analysis system called ARBUR. ARBUR can cluster without bias both non-step (continuous) and step USVs via three steps, hierarchically detect eight types of behavior of two freely behaving rats with high accuracy, and locate the vocal rat in 3-D space. By simultaneously recording the video and ultrasonic streams of two freely behaving rats, we show that ARBUR can not only automatically reveal the well-understood behavior-associated vocalizations that were carefully concluded by other behavioral researchers, but also hold the promise to indicate novel findings that can be hardly found by manual analysis, especially regarding step USVs and the active/passive rat-associated vocalizations during aggressive/intimate social behaviors. Using ARBUR, we further confirm that rats communicate via distinct USVs when engaging in different types of social behavior. This work could help mechanistically understand the interactive influence between the behavior and USVs of rats.