Xinyao Lv scite author profile

Xinyao Lv

5Publications

7Citation Statements Received

69Citation Statements Given

How they've been cited

How they cite others

Affiliations

Tianjin University, Shandong Normal University

Publications

Order By: Most citations

MFVT: Multilevel Feature Fusion Vision Transformer and RAMix Data Augmentation for Fine-Grained Visual Categorization

Xia²,

Li³

et al. 2022

Electronics

View full text Add to dashboard Cite

The introduction and application of the Vision Transformer (ViT) has promoted the development of fine-grained visual categorization (FGVC). However, there are some problems when directly applying ViT to FGVC tasks. ViT only classifies using the class token in the last layer, ignoring the local and low-level features necessary for FGVC. We propose a ViT-based multilevel feature fusion transformer (MFVT) for FGVC tasks. In this framework, with reference to ViT, the backbone network adopts 12 layers of Transformer blocks, divides it into four stages, and adds multilevel feature fusion (MFF) between Transformer layers. We also design RAMix, a CutMix-based data augmentation strategy that uses the resize strategy for crop-paste images and label assignment based on attention. Experiments on the CUB-200-2011, Stanford Dogs, and iNaturalist 2017 datasets gave competitive results, especially on the challenging iNaturalist 2017, with an accuracy rate of 72.6%.

show abstract

Pedestrian detection algorithm based on multi-scale feature extraction and attention feature fusion

Xia

et al. 2022

Digital Signal Processing

View full text Add to dashboard Cite

MAFA-net: pedestrian detection network based on multi-scale attention feature aggregation

Xia

Wan

et al. 2021

Appl Intell

View full text Add to dashboard Cite

With pedestrian detection algorithms, balancing the trade-off between accuracy and speed remains challenging. Following the central point-based one-stage object detection paradigm, a pedestrian detection algorithm based on multi-scale attention feature aggregation (MAFA) is proposed to improve accuracy while considering real-time performance. We refer to the proposed algorithm as MAFA-Net. Through the design of deep dilate blocks, deeper features are extracted. Pedestrian attention blocks are added to mine more relevant information between features from the perspective of spatial and passage-wise dimensions, and pedestrian features are enhanced. Feature aggregation modules are used to fuse different scale features, and combine the rich high-level semantic features with the accurate location features of the low-level features. Experiments were conducted on two challenging pedestrian detection datasets, i.e., CityPersons and Caltech, using MR −2 as the evaluation index. For Caltech, MR −2 is 4.58% under reasonable conditions. For CityPersons, MR −2 is 11.47% and 10.05% under reasonable and partial occlusion conditions, which is 0.43% and 1.35% better than the suboptimal comparison detection method. The results demonstrate that a good performance is obtained, and the effectiveness and feasibility of the algorithm are verified.

show abstract

MFVT: Multilevel Feature Fusion Vision Transformer and RAMix Data Augmentation for Fine-grained Visual Categorization

Xia

et al. 2022

Preprint

View full text Add to dashboard Cite

The introduction and application of the Vision Transformer (ViT) has promoted the development of fine-grained visual categorization (FGVC). However, there are some problems with directly applying ViT to FGVC tasks. ViT only classifies using the class token in the last layer, ignoring the local and low-level features necessary for FGVC. We propose a ViT-based multilevel feature fusion transformer (MFVT) for FGVC tasks. In this framework, with reference to ViT, the backbone network adopts 12 layers of Transformer blocks, divides it into four stages, and adds multilevel feature fusion (MFF) between Transformer layers. We also design RAMix, a CutMix-based data augmentation strategy that uses the resize strategy for crop-paste images and label assignment based on attention. Experiments on the CUB-200-2011, Stanford Dogs, and iNaturalist 2017 datasets gave competitive results, especially on the challenging iNaturalist 2017, with an accuracy rate of 72.6%.

show abstract

The Mechanism for Compositional Changes of Blue and White Porcelain Body and Glaze in the Imperial Kilns During Ming Dynasty

Wang

et al. 2023

Preprint

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Xinyao Lv

MFVT: Multilevel Feature Fusion Vision Transformer and RAMix Data Augmentation for Fine-Grained Visual Categorization

Pedestrian detection algorithm based on multi-scale feature extraction and attention feature fusion

MAFA-net: pedestrian detection network based on multi-scale attention feature aggregation

MFVT: Multilevel Feature Fusion Vision Transformer and RAMix Data Augmentation for Fine-grained Visual Categorization

The Mechanism for Compositional Changes of Blue and White Porcelain Body and Glaze in the Imperial Kilns During Ming Dynasty

Contact Info

Product

Resources

About