Zhihong Yu scite author profile

Recent generative adversarial network based methods have shown promising results for the charming but challenging task of synthesizing images from text descriptions. These approaches can generate images with general shape and color but often produce distorted global structures with unnatural local semantic details. It is due to ineffectiveness of convolutional neural networks in capturing the high-level semantic information for pixel-level image synthesis. In this paper, we propose a Dual Attentional Generative Adversarial Network (DualAttn-GAN) in which the dual attention modules are introduced to enhance local details and global structures by attending to related features from relevant words and different visual regions. As one of the dual modules, the textual attention module is designed to explore the fine-grained interaction between vision and language. On the other hand, visual attention module models internal representations of vision from channel and spatial axes, which can better capture the global structures. Meanwhile, we apply an attention embedding module to merge multi-path features. Furthermore, we present an inverted residual structure to boost representation power of CNNs and apply spectral normalization to stabilize GAN training. With extensive experimental validation on two benchmark datasets, our method significantly improves stateof-the-art models over the evaluation metrics of inception score and Fréchet inception distance.INDEX TERMS Generative adversarial network, textual attention, visual attention, inverted residual structure, spectral normalization.

show abstract

A binocular vision-based underwater object size measurement paradigm: Calibration-Detection-Measurement (C-D-M)

Zhou

et al. 2023

Measurement

View full text Add to dashboard Cite

Image Object Extraction Based on Semantic Segmentation and Label Loss

Wang

Yu³

et al. 2020

IEEE Access

View full text Add to dashboard Cite

Object extraction refers to the operation of obtaining an object area from an image based on a small amount of mark information given by users, which is a key step in image processing. In order to obtain a complete object profile, current methods usually require a large number of manual annotations, especially for objects with irregular contours. Since traditional algorithms rely on low-level pixel features without semantic information, and are based on obvious mathematical assumptions (ie, strong inductive bias), it is difficult to completely identify objects. At present, in order to improve the integrity of object extraction, semantic segmentation-based methods increase the complexity and latancy by adding more pre-processing and post-processing steps. In this paper, we propose a novel model named IOEBSS, which includes a fast binary plane pre-processing, an improved Deeplab v3+ semantic segmentation model, and an auxiliary loss function named Label Loss. Through the fast binary plane pre-processing, the model can accelerate the transformation of interactive inputs. The improved semantic segmentation model makes the extracted results more semantically complete, and Label Loss is more conducive to gradient flow and accelerates training convergence. For the above reasons, IOEBSS can accurately and quickly identify objects with complex contours and colors. On Pascal VOC and COCO datasets, compared to current methods, IOEBSS has a significant improvement in accuracy, inference speed, and convergence speed.

show abstract

Seed Identification of Gramineous Grass Using Local Similarity Pattern and Linear Discriminant Analysis

Chen¹,

Pan²,

Ma³

et al. 2017

Open Cyber

View full text Add to dashboard Cite

Subheading:Grass Seed Identification Using LSP and LDA. Background:Forage plays an important role in grassland in providing food for the livestock and keeping balance for the ecological system. Automated identification of fora-ge is an important task to improve the grassland management. Forage seed is the vital organ with relatively stable characteristics. Different from the relatively obvious varia-tions among the weeds, forage seeds are very similar in color, shape, size and texture. Especially, the resemblance of some seeds from different families makes the identification more difficult. Objective:In this paper, we proposed a seed identification approach based on local similarity pattern and linear discriminant analysis for gramineous grass, one of the main forge categories of the grassland, for a better identification performance. Method:The textural features derived from local similarity pattern and histogram statistics were input into linear discriminant analysis classifier, in which the former can extract more specific textures robust to noise and rotation variance, and the latter was more discriminative with classification information. Result:Experiments conducted on similar gramineous grass seeds of 12 species demonstrated the effectiveness of the algorithm, yielding an identification accuracy of 91.07%. Conclusion:Therefore, local similarity pattern and linear discriminant analysis classifier can well solve the identification problems of similar gramineous grass seeds.

show abstract

DIM: Adaptively Combining User Interests Mined at Different Stages Based on Deformable Interest Model

Wang

Yu³

et al. 2020

Mathematical Problems in Engineering

View full text Add to dashboard Cite

User interest mining is widely used in the fields of personalized search and personalized recommendation. Traditional methods ignore the formation of user interest which is a process that evolves over time. This leads to the inability to accurately describe the distribution of user interest. In this paper, we propose the interest tracking model (ITM). To add the timing, ITM uses Dirichlet distribution and multinomial distribution to describe the evolutional process of interest topics and frequent patterns, which well adapts to the evolution of user interest hidden in short texts between different time slices. In addition, it is well known that user interest is composed of long-term interest and situational interest including short-term interest and social hot topics. State-of-the-art methods simply regard the users’ long-term interest as the users’ final interest, which makes those unable to completely describe the user interest distribution. To solve this problem, we propose the deformable interest model (DIM) which designs an objective function to combine users’ long-term interest and situational interest and more comprehensively and accurately mine user interest. Furthermore, we present the degree of deformation which measures the subinterest's degree of influence on final interest and propose in DIM the influence real-time update mechanism. The mechanism adaptively updates the degree of deformation through the linear iteration and reduces the degree of dependence of the interest model on training sets. We present results via a dataset consisting of Flickr users and their uploaded information in three months, a dataset consisting of Twitter users and their tweets in three months, and a dataset consisting of Instagram users and their uploaded information in three months, showing that the perplexity is reduced to 0.378, the average accuracy is increased to 94%, and the average NMI is increased to 0.20, which prove better interest prediction.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Zhihong Yu

Dualattn-GAN: Text to Image Synthesis With Dual Attentional Generative Adversarial Network

A binocular vision-based underwater object size measurement paradigm: Calibration-Detection-Measurement (C-D-M)

Image Object Extraction Based on Semantic Segmentation and Label Loss

Seed Identification of Gramineous Grass Using Local Similarity Pattern and Linear Discriminant Analysis

DIM: Adaptively Combining User Interests Mined at Different Stages Based on Deformable Interest Model

Contact Info

Product

Resources

About