For detection of dense small-target organisms with indistinct features in complex background, the efficiency and accuracy of traditional target detection methods are low. Multi-sensor fusion oriented human-robot interaction (HRI) system has facilitated biologists to process and analyse data. For this, several deep learning models based on convolutional neural network (CNN) are improved and compared to study the species and density of dense organisms in deep-sea hydrothermal vent, which are fused it with related environmental information given by position sensors and conductivity-temperature-depth (CTD) sensors, so as to perfect multi-sensor fusion oriented HRI system. Firstly, the authors combined different meta-architectures and different feature extractors, and obtained five object identification algorithms based on CNN. Then, they compared computational cost of feature extractors and weighed the pros and cons of each algorithm from mean detection speed, correlation coefficient and mean class-specific confidence score to confirm that Faster Region-based CNN (R-CNN)_InceptionNet is the best algorithm applicable to hydrothermal vent biological dataset. Finally, they calculated the cognitive accuracy of rimicaris exoculata in dense and sparse areas, which were 88.3% and 95.9% respectively, to analyse the performance of the Faster R-CNN_InceptionNet. Results show that the proposed method can be used in the multi-sensor fusion oriented HRI system for the statistics of dense organisms in complex environments.This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.