The quality grading of mangoes is a crucial task for mango growers as it vastly affects their profit. However, until today, this process still relies on laborious efforts of humans, who are prone to fatigue and errors. To remedy this, the paper approaches the grading task with various convolutional neural networks (CNN), a tried-and-tested deep learning technology in computer vision. The models involved include Mask R-CNN (for background removal), the numerous past winners of the ImageNet challenge, namely AlexNet, VGGs, and ResNets; and, a family of self-defined convolutional autoencoder-classifiers (ConvAE-Clfs) inspired by the claimed benefit of multi-task learning in classification tasks. Transfer learning is also adopted in this work via utilizing the ImageNet pretrained weights. Besides elaborating on the preprocessing techniques, training details, and the resulting performance, we go one step further to provide explainable insights into the model's working with the help of saliency maps and principal component analysis (PCA). These insights provide a succinct, meaningful glimpse into the intricate deep learning black box, fostering trust, and can also be presented to humans in real-world use cases for reviewing the grading results.
Attention-based Transformer models have been increasingly employed for automatic music generation. To condition the generation process of such a model with a user-specified sequence, a popular approach is to take that conditioning sequence as a priming sequence and ask a Transformer decoder to generate a continuation. However, this prompt-based conditioning cannot guarantee that the conditioning sequence would develop or even simply repeat itself in the generated continuation. In this paper, we propose an alternative conditioning approach, called theme-based conditioning, that explicitly trains the Transformer to treat the conditioning sequence as a thematic material that has to manifest itself multiple times in its generation result. This is achieved with two main technical contributions. First, we propose a deep learning-based approach that uses contrastive representation learning and clustering to automatically retrieve thematic materials from music pieces in the training data. Second, we propose a novel gated parallel attention module to be used in a sequence-to-sequence (seq2seq) encoder/decoder architecture to more effectively account for a given conditioning thematic material in the generation process of the Transformer decoder. We report on objective and subjective evaluations of variants of the proposed Theme Transformer and the conventional promptbased baseline, showing that our best model can generate, to some extent, polyphonic pop piano music with repetition and plausible variations of a given condition.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.