Mingkang Tang scite author profile

Mingkang Tang

3Publications

18Citation Statements Received

67Citation Statements Given

How they've been cited

How they cite others

119

Affiliations

Tsinghua University, Jiangxi University of Science and Technology

Publications

Order By: Most citations

CLIP4Caption: CLIP for Video Caption

Tang

Wang

Liu

et al. 2021

View full text Add to dashboard Cite

Video captioning is a challenging task since it requires generating sentences describing various diverse and complex videos. Existing video captioning models lack adequate visual representation due to the neglect of the existence of gaps between videos and texts. To bridge this gap, in this paper, we propose a CLIP4Caption framework that improves video captioning based on a CLIP-enhanced video-text matching network (VTM). This framework is taking full advantage of the information from both vision and language and enforcing the model to learn strongly text-correlated video features for text generation. Besides, unlike most existing models using LSTM or GRU as the sentence decoder, we adopt a Transformer structured decoder network to effectively learn the long-range visual and language dependency. Additionally, we introduce a novel ensemble strategy for captioning tasks. Experimental results demonstrate the effectiveness of our method on two datasets: 1) on MSR-VTT dataset, our method achieved a new state-of-the-art result with a significant gain of up to 10% in CIDEr; 2) on the private test data, our method ranking 2nd place in the ACM MM multimedia grand challenge 2021: Pre-training for Video Understanding Challenge. It is noted that our model is only trained on the MSR-VTT dataset. CCS CONCEPTS• Computing methodologies → Neural networks.

show abstract

Stay in Grid: Improving Video Captioning via Fully Grid-Level Representation

Tang

Wang

Zeng

et al. 2023

IEEE Trans. Circuits Syst. Video Technol.

View full text Add to dashboard Cite

Towards Better Dermoscopic Image Feature Representation Learning for Melanoma Classification

Tang

Yang

et al. 2021

View full text Add to dashboard Cite

Deep learning-based melanoma classification with dermoscopic images has recently shown great potential in automatic earlystage melanoma diagnosis. However, limited by the significant data imbalance and obvious extraneous artifacts, i.e., the hair and ruler markings, discriminative feature extraction from dermoscopic images is very challenging. In this study, we seek to resolve these problems respectively towards better representation learning for lesion features. Specifically, a GAN-based data augmentation (GDA) strategy is adapted to generate synthetic melanoma-positive images, in conjunction with the proposed implicit hair denoising (IHD) strategy. Wherein the hair-related representations are implicitly disentangled via an auxiliary classifier network and reversely sent to the melanoma-feature extraction backbone for better melanoma-specific representation learning. Furthermore, to train the IHD module, the hair noises are additionally labeled on the ISIC2020 dataset, making it the first large-scale dermoscopic dataset with annotation of hair-like artifacts. Extensive experiments demonstrate the superiority of the proposed framework as well as the effectiveness of each component. The improved dataset publicly avaliable at https://github.com/kirtsy/DermoscopicDataset.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Mingkang Tang

CLIP4Caption: CLIP for Video Caption

Stay in Grid: Improving Video Captioning via Fully Grid-Level Representation

Towards Better Dermoscopic Image Feature Representation Learning for Melanoma Classification

Contact Info

Product

Resources

About