Birds have been widely considered crucial indicators of biodiversity. It is essential to identify bird species precisely for biodiversity surveys. With the rapid development of artificial intelligence, bird species identification has been facilitated by deep learning using audio samples. Prior studies mainly focused on identifying several bird species using deep learning or machine learning based on acoustic features. In this paper, we proposed a novel deep learning method to better identify a large number of bird species based on their call. The proposed method was made of LSTM (Long Short−Term Memory) with coordinate attention. More than 70,000 bird−call audio clips, including 264 bird species, were collected from Xeno−Canto. An evaluation experiment showed that our proposed network achieved 77.43% mean average precision (mAP), which indicates that our proposed network is valuable for automatically identifying a massive number of bird species based on acoustic features and avian biodiversity monitoring.
Mixing data augmentation methods have been widely used in text classification recently. However, existing methods do not control the quality of augmented data and have low model explainability. To tackle these issues, this paper proposes an explainable text classification solution based on attentive and targeted mixing data augmentation, ATMIX. Instead of selecting data for augmentation without control, ATMIX focuses on the misclassified training samples as the target for augmentation to better improve the model's capability. Meanwhile, to generate meaningful augmented samples, it adopts a self-attention mechanism to understand the importance of the subsentences in a text, and cut and mix the subsentences between the misclassified and correctly classified samples wisely. Furthermore, it employs a novel dynamic augmented data selection framework based on the loss function gradient to dynamically optimize the augmented samples for model training. In the end, we develop a new model explainability evaluation method based on subsentence attention and conduct extensive evaluations over multiple real-world text datasets. The results indicate that ATMIX is more effective with higher explainability than the typical classification models, hidden-level, and input-level mixup models.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.