Biodiversity image repositories are crucial sources of training data for machine learning approaches to biological research. Metadata, specifically metadata about object quality, is putatively an important prerequisite to selecting sample subsets for these experiments. This study demonstrates the importance of image quality metadata to a species classification experiment involving a corpus of 1935 fish specimen images which were annotated with 22 metadata quality properties. A small subset of high quality images produced an F1 accuracy of 0.41 compared to 0.35 for a taxonomically matched subset of low quality images when used by a convolutional neural network approach to species identification. Using the full corpus of images revealed that image quality differed between correctly classified and misclassified images. We found the visibility of all anatomical features was the most important quality feature for classification accuracy. We suggest biodiversity image repositories consider adopting a minimal set of image quality metadata to support future machine learning projects.
Physics-guided Neural Networks (PGNNs) represent an emerging class of neural networks that are trained using physics-guided (PG) loss functions (capturing violations in network outputs with known physics), along with the supervision contained in data. Existing work in PGNNs has demonstrated the efficacy of adding single PG loss functions in the neural network objectives, using constant trade-off parameters, to ensure better generalizability. However, in the presence of multiple PG functions with competing gradient directions, there is a need to
adaptively
tune the contribution of different PG loss functions during the course of training to arrive at generalizable solutions. We demonstrate the presence of competing PG losses in the generic neural network problem of solving for the lowest (or highest) eigenvector of a physics-based eigenvalue equation, which is commonly encountered in many scientific problems. We present a novel approach to handle competing PG losses and demonstrate its efficacy in learning generalizable solutions in two motivating applications of quantum mechanics and electromagnetic propagation. All the code and data used in this work are available at https://github.com/jayroxis/Cophy-PGNN.
Biodiversity image repositories are crucial sources for training machine learning approaches to support biological research. Metadata about object (e.g. image) quality is a putatively important prerequisite to selecting samples for these experiments. This paper reports on a study demonstrating the importance of
image quality metadata
for a species classification experiment involving a corpus of 1935 fish specimen images which were annotated with 22 metadata quality properties. A small subset of high quality images produced an F1 accuracy of 0.41 compared to 0.35 for a taxonomically matched subset low quality images when used by a convolutional neural network approach to species identification. Using the full corpus of images revealed that image quality differed between correctly classified and misclassified images. We found anatomical feature visibility was the most important quality feature for classification accuracy. We suggest biodiversity image repositories consider adopting a minimal set of image quality metadata to support machine learning.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.