Jing Li scite author profile

Image captioning aims to automatically generate a natural language description of a given image, and most state-ofthe-art models have adopted an encoder-decoder framework. The framework consists of a convolution neural network (CNN)-based image encoder that extracts region-based visual features from the input image, and an recurrent neural network (RNN)-based caption decoder that generates the output caption words based on the visual features with the attention mechanism. Despite the success of existing studies, current methods only model the co-attention that characterizes the inter-modal interactions while neglecting the self-attention that characterizes the intramodal interactions. Inspired by the success of the Transformer model in machine translation, here we extend it to a Multimodal Transformer (MT) model for image captioning. Compared to existing image captioning approaches, the MT model simultaneously captures intra-and inter-modal interactions in a unified attention block. Due to the in-depth modular composition of such attention blocks, the MT model can perform complex multimodal reasoning and output accurate captions. Moreover, to further improve the image captioning performance, multi-view visual features are seamlessly introduced into the MT model. We quantitatively and qualitatively evaluate our approach using the benchmark MSCOCO image captioning dataset and conduct extensive ablation studies to investigate the reasons behind its effectiveness. The experimental results show that our method significantly outperforms the previous state-of-the-art methods. With an ensemble of seven models, our solution ranks the 1st place on the real-time leaderboard of the MSCOCO image captioning challenge at the time of the writing of this paper.

show abstract

Drug Target Predictions Based on Heterogeneous Graph Inference

Wang

Yang

2012

View full text Add to dashboard Cite

A key issue in drug development is to understand the hidden relationships among drugs and targets. Computational methods for novel drug target predictions can greatly reduce time and costs compared with experimental methods. In this paper, we propose a network based computational approach for novel drug and target association predictions. More specifically, a heterogeneous drug-target graph, which incorporates known drug-target interactions as well as drug-drug and target-target similarities, is first constructed. Based on this graph, a novel graph-based inference method is introduced. Compared with two state-of-the-art methods, large-scale cross-validation results indicate that the proposed method can greatly improve novel target predictions.

show abstract

An enhancement denoising autoencoder for rolling bearing fault diagnosis

et al. 2018

View full text Add to dashboard Cite

Short tau inversion recovery and proton density-weighted fat suppressed sequences for the evaluation of osteoarthritis of the knee with a 1.0 T dedicated extremity MRI: development of a time-efficient sequence protocol

et al. 2005

View full text Add to dashboard Cite

Aim of this study was to develop a time-efficient sequence protocol for a 1.0 T dedicated MR system to be used for whole-organ scoring of osteoarthritis (OA). Thirty-four knees were examined using a protocol that included fat suppressed fast spin echo proton density weighted sequences (PDFS) in three planes plus a coronal STIR sequence. Two radiologists scored each knee by consensus for five OA features. In separate sessions, all knees were scored using three different combinations of sequences: (1) all four sequences (reference protocol, 16 min 31 s scanning time), (2) three PDFS sequences without STIR ("No STIR", 12 min 25 s scanning time) and (3) sagittal and axial PDFS sequences plus a coronal STIR sequence ("No PDFS", 11 min 49 s scanning time). Agreement of the readings using both subsets of sequences compared to the reference protocol was evaluated using weighted kappa statistics. kappa-coefficients showed good or excellent agreement for both sequence subsets in comparison to the reference protocol for all assessed features. kappa-coefficients for No PDFS/No STIR: bone marrow abnormalities (0.74/0.67), subarticular cysts (0.84/0.63), marginal osteophytes (0.77/0.71), menisci (0.75/0.79), tibial cartilage (0.71/0.78). Optimization of sequence protocols consisting of three sequences results in time savings and cost efficiency in imaging of knee OA without loss of information over a more time consuming protocol.

show abstract

An Improved Min-Min Algorithm in Cloud Computing

Liu

2013

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Jing Li

Multimodal Transformer With Multi-View Visual Representation for Image Captioning

Drug Target Predictions Based on Heterogeneous Graph Inference

An enhancement denoising autoencoder for rolling bearing fault diagnosis

Short tau inversion recovery and proton density-weighted fat suppressed sequences for the evaluation of osteoarthritis of the knee with a 1.0 T dedicated extremity MRI: development of a time-efficient sequence protocol

An Improved Min-Min Algorithm in Cloud Computing

Contact Info

Product

Resources

About