There is a need for automatic systems that can reliably detect, track and classify fish and other marine species in underwater videos without human intervention. Conventional computer vision techniques do not perform well in underwater conditions where the background is complex and the shape and textural features of fish are subtle. Data-driven classification models like neural networks require a huge amount of labelled data, otherwise they tend to over-fit to the training data and fail on unseen test data which is not involved in training. We present a state-of-the-art computer vision method for fine-grained fish species classification based on deep learning techniques. A cross-layer pooling algorithm using a pre-trained Convolutional Neural Network as a generalized feature detector is proposed, thus avoiding the need for a large amount of training data. Classification on test data is performed by a SVM on the features computed through the proposed method, resulting in classification accuracy of 94.3% for fish species from typical underwater video imagery captured off the coast of Western Australia. This research advocates that the development of automated classification systems which can identify fish from underwater video imagery is feasible and a cost-effective alternative to manual identification by humans.
Underwater visual census of reef fish by scuba divers is a widely used and useful technique for assessing the composition and abundance of reef fish assemblages, but suffers from several biases and errors. We compare the accuracy of underwater visual estimates of distance made by novice and experienced scientific divers and an underwater stereo-video system. We demonstrate the potential implications that distance errors may have on underwater visual census assessments of reef fish abundance. We also investigate how the accuracy and precision of scuba diver length estimates of fish is affected as distance increases. Distance was underestimated by both experienced (mean relative error = −11.7%, s.d. = 21.4%) and novice scientific divers (mean relative error = −5.0%, s.d. = 17.9%). For experienced scientific divers this error may potentially result in an 82% underestimate or 194% overestimate of the actual area censused, which will affect estimates of fish density. The stereo-video system also underestimated distance but to a much lesser degree (mean relative error = −0.9%, s.d. = 2.6%) and with less variability than the divers. There was no correlation between the relative error of length estimates and the distance of the fish away from the observer.
Underwater video and digital still cameras are rapidly being adopted by marine scientists and managers as a tool for non‐destructively quantifying and measuring the relative abundance, cover and size of marine fauna and flora. Imagery recorded of fish can be time consuming and costly to process and analyze manually. For this reason, there is great interest in automatic classification, counting, and measurement of fish. Unconstrained underwater scenes are highly variable due to changes in light intensity, changes in fish orientation due to movement, a variety of background habitats which sometimes also move, and most importantly similarity in shape and patterns among fish of different species. This poses a great challenge for image/video processing techniques to accurately differentiate between classes or species of fish to perform automatic classification. We present a machine learning approach, which is suitable for solving this challenge. We demonstrate the use of a convolution neural network model in a hierarchical feature combination setup to learn species‐dependent visual features of fish that are unique, yet abstract and robust against environmental and intra‐and inter‐species variability. This approach avoids the need for explicitly extracting features from raw images of the fish using several fragmented image processing techniques. As a result, we achieve a single and generic trained architecture with favorable performance even for sample images of fish species that have not been used in training. Using the LifeCLEF14 and LifeCLEF15 benchmark fish datasets, we have demonstrated results with a correct classification rate of more than 90%.
Calibration of a camera system is essential to ensure that image measurements result in accurate estimates of locations and dimensions within the object space. In the underwater environment, the calibration must implicitly or explicitly model and compensate for the refractive effects of waterproof housings and the water medium. This paper reviews the different approaches to the calibration of underwater camera systems in theoretical and practical terms. The accuracy, reliability, validation and stability of underwater camera system calibration are also discussed. Samples of results from published reports are provided to demonstrate the range of possible accuracies for the measurements produced by underwater camera systems.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.