As a key equipment for production and transportation, conveyor belts are widely used in the coal mining industry. Once the conveyor belt longitudinal tear occurs, it will seriously affect the production and even cause personal injury. Therefore, the longitudinal tear detection of the conveyor belt is extremely important. In this paper, the sound detection is introduced into the longitudinal tear detection of the conveyor belt, and an audio-visual detection method for conveyor belt longitudinal tear is proposed. Camera and microphone array are used to collect the image and sound signals of conveyor belt, and the conveyor belt is detected from both image and sound, respectively. Then the image and sound analysis results are combined to comprehensively judge the status of the conveyor belt. The experimental results show that the audiovisual detection method can accurately identify the normal, abnormal, and longitudinal tear of the conveyor belt. The detection accuracy is over 86.72% and the sensitivity of longitudinal tear detection is greater than 92.59%. The proposed audio-visual detection method is verified to meet the requirements of longitudinal tear detection of coal mining industry conveyor belts.