Bai nationality has a long history and has its own language. Limited by the fact that there are fewer and fewer people who know the Bai language, the literature and culture of the Bai nationality begin to lose rapidly. In order to make the people who do not understand Bai characters can also read the ancient books of Bai nationality, this paper is based on the research of high-precision single character recognition model of Bai characters. First, with the help of Bai culture lovers and related scholars, we have constructed a data set of Bai characters, but limited by the need of expert knowledge, so the data set is limited in size. As a result, deep learning models with the nature of data hunger cannot get an ideal accuracy. In order to solve this issue, we propose to use the Chinese data set which also belongs to Sino-Tibetan language family to improve the recognition accuracy of Bai characters through transfer learning. In addition, we propose four transfer learning approaches: Direct Knowledge Transfer (DKT), Indirect Knowledge Transfer (IKT), Self-coding Knowledge Transfer (SCKT), and Self-supervised Knowledge Transfer (SSKT). Experiments show that our approaches greatly improve the recognition accuracy of Bai characters.
Multimodal semantic segmentation is a pivotal component of computer vision and typically surpasses unimodal methods by utilizing rich information set from various sources. Current models frequently adopt modality-specific frameworks that inherently biases toward certain modalities. Although these biases might be advantageous in specific situations, they generally limit the adaptability of the models across different multimodal contexts, thereby potentially impairing performance. To address this issue, we leverage the inherent capabilities of the model itself to discover the optimal equilibrium in multimodal fusion and introduce U3M: An Unbiased Multiscale Modal Fusion Model for Multimodal Semantic Segmentation. Specifically, this method involves an unbiased integration of multimodal visual data. Additionally, we employ feature fusion at multiple scales to ensure the effective extraction and integration of both global and local features. Experimental results demonstrate that our approach achieves superior performance across multiple datasets, verifing its efficacy in enhancing the robustness and versatility of semantic segmentation in diverse settings. Our code is available at U3M-multimodal-semantic-segmentation.
The Bai People have left behind a wealth of ancient texts that record their splendid civilization, unfortunately fewer and fewer people can read these texts in the present time. Therefore, it is of great practical value to design a model that can automatically recognize the Bai ancient (offset) texts. However, due to the expert knowledge involved in the annotation of ancient (offset) texts, and its limited scale, we consider that using handwritten Bai texts to help identify ancient (offset) Bai texts for handwritten Bai texts can be easily obtained and annotated. Essentially, this is a problem of domain adaptation, and some of the domain adaptation methods were transplanted to handle ancient (offset) Bai text recognition. Unfortunately, none of them succeeded in obtaining a high performance due to the fact that they do not solve the problem of how to separate the style and content information of an image. To address this, an information separation network (ISN) that can effectively separate content and style information and eventually classify with content features only, is proposed. Specifically, our network first divides the visual features into a style feature and a content feature by a separator, and ensures that the style feature contains only style and the content feature contains only content by cross-domain cross-reconstruction; thus, achieving the separation of style and content, and finally using only the content feature for classification. This greatly reduces the impact brought by cross-domain. The proposed method achieves leading results on five public datasets and a private one.
Conventional zero-shot learning aims to train a classifier on a training set (seen classes) to recognize instances of novel classes (unseen classes) by class-level semantic attributes. In generalized zero-shot learning (GZSL), the classifier needs to recognize both seen and unseen classes, which is a problem of extreme data imbalance. To solve this problem, feature generative methods have been proposed to make up for the lack of unseen classes. Current generative methods use class semantic attributes as the cues for synthetic visual features, which can be considered mapping of the semantic attribute to visual features. However, this mapping cannot effectively transfer knowledge learned from seen classes to unseen classes because the information in the semantic attributes and the information in visual features are asymmetric: semantic attributes contain key category description information, while visual features consist of visual information that cannot be represented by semantics. To this end, we propose a residual-prototype-generating network (RPGN) for GZSL that extracts the residual visual features from original visual features by an encoder–decoder and synthesizes the prototype visual features associated with semantic attributes by a disentangle regressor. Experimental results show that the proposed method achieves competitive results on four GZSL benchmark datasets with significant gains.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.