Lei Zhu scite author profile

This paper aims to solve the problem of large-scale video retrieval by a query image. Firstly, we define the problem of top-k image to video query. Then, we combine the merits of convolutional neural networks(CNN for short) and Bag of Visual Word(BoVW for short) module to design a model for video frames information extraction and representation. In order to meet the requirements of large-scale video retrieval, we proposed a visual weighted inverted index(VWII for short) and related algorithm to improve the efficiency and accuracy of retrieval process. Comprehensive experiments show that our proposed technique achieves substantial improvements (up to an order of magnitude speed up) over the state-of-the-art techniques with similar accuracy.

show abstract

Hierarchical information quadtree: efficient spatial temporal image search for multimedia stream

Zhang

Chen

Zhu

et al. 2018

Multimed Tools Appl

View full text Add to dashboard Cite

Massive amount of multimedia data that contain times-tamps and geographical information are being generated at an unprecedented scale in many emerging applications such as photo sharing web site and social networks applications. Due to their importance, a large body of work has focused on efficiently computing various spatial image queries. In this paper,we study the spatial temporal image query which considers three important constraints during the search including time recency, spatial proximity and visual relevance. A novel index structure, namely Hierarchical Information Quadtree(HI-Quadtree), to efficiently insert/delete spatial temporal images with high arrive rates. Base on HI-Quadtree an efficient algorithm is developed to support spatial temporal image query. We show via extensive experimentation with real spatial databases clearly demonstrate the efficiency of our methods.

show abstract

HCMSL: Hybrid Cross-modal Similarity Learning for Cross-modal Retrieval

Zhang

Song

Zhu

et al. 2021

ACM Trans. Multimedia Comput. Commun. Appl.

View full text Add to dashboard Cite

The purpose of cross-modal retrieval is to find the relationship between different modal samples and to retrieve other modal samples with similar semantics by using a certain modal sample. As the data of different modalities presents heterogeneous low-level feature and semantic-related high-level features, the main problem of cross-modal retrieval is how to measure the similarity between different modalities. In this article, we present a novel cross-modal retrieval method, named Hybrid Cross-Modal Similarity Learning model (HCMSL for short). It aims to capture sufficient semantic information from both labeled and unlabeled cross-modal pairs and intra-modal pairs with same classification label. Specifically, a coupled deep fully connected networks are used to map cross-modal feature representations into a common subspace. Weight-sharing strategy is utilized between two branches of networks to diminish cross-modal heterogeneity. Furthermore, two Siamese CNN models are employed to learn intra-modal similarity from samples of same modality. Comprehensive experiments on real datasets clearly demonstrate that our proposed technique achieves substantial improvements over the state-of-the-art cross-modal retrieval techniques.

show abstract

The Application of Deep Learning in Airport Visibility Forecast

Zhu¹,

Zhu²,

Han³

et al. 2017

ACS

View full text Add to dashboard Cite

This paper uses Urumqi International Airport's hourly observation from 2007 to 2016 and builds regression prediction model for airport visibility with deep learning method. From the results we can see: the absolute error of hourly visibility is 706 m. When the visibility ≤ 1000 m, the absolute error is 325 m, and this method can predict visibility's trend. So we can use this method to provide the airport visibility's objective forecast guidance products for aviation meteorological services in the future. In this paper, the Urumqi area is as the research object, to explore the depth of learning in the field of weather forecasting applications, providing a new visibility return forecast for weather forecast personnel so as to improve the visibility of the level of visibility to ensure the safe and stable operation of the airport.

show abstract

Efficient continuous top-k geo-image search on road network

Zhang

Cheng

Zhu

et al. 2018

Multimed Tools Appl

View full text Add to dashboard Cite

With the rapid development of mobile Internet and cloud computing technology, large-scale multimedia data, e.g., texts, images, audio and videos have been generated, collected, stored and shared. In this paper, we propose a novel query problem named continuous top-k geo-image query on road network which aims to search out a set of geovisual objects based on road network distance proximity and visual content similarity. Existing approaches for spatial textual query and geo-image query cannot address this problem effectively because they do not consider both of visual content similarity and road network distance proximity on road network. In order to address this challenge effectively and efficiently, firstly we propose the definition of geo-visual objects and continuous top-k geo-visual objects query on road network, then develop a score function for search. To improve the query efficiency in a large-scale road network, we propose the search algorithm named geo-visual search on road network based on a novel hybrid indexing framework called VIG-Tree, which combines G-Tree and visual inverted index technique. In addition, an important notion named safe interval and results updating rule are proposed, and based on them we develop an efficient algorithm named moving monitor algorithm to solve continuous query. Experimental evaluation on real multimedia dataset and road network dataset illustrates that our solution outperforms state-of-the-art method. 2 Chengyuan Zhang et al.Keywords multimedia retrieval · geo-visual objects · continuous top-k query · road network 1 IntroductionWith the rapid development of mobile Internet and cloud computing technology, large scale multimedia data [36,33,31], e.g., texts, images [32], audio and videos have been generated, collected, stored and shared. For example. Facebook, the most famous online social networks services, reports more then 300 million photos uploaded and shared daily in the November 2013. More than 3.5 million photos had been uploaded by 87 million registered users of Flickr, which is the largest online photo sharing service. More than 140 million Twitter users posts 400 million tweets which contain 140 characters in text and images with geographical information like latitude and longitude. YouTube, the largest video sharing web site in the world, shares more than 100 hours of videos every minutes in the end of 2013. The total amount of users in Himalaya which is a popular audio sharing platform are already more than 470 million. As of December 2015, the total amount of audio has exceeded 15 million in Himalaya. Beyond all doubt, unlike the situation in the past, the mess multimedia data account for nearly 80% total amount of data in the big data environment. As it knows to all, advanced mobile devices equipped with wireless network module, high definition camera and microphone such as smartphones and tablets, together with other popular mobile applications and location based services (LBS for short) like WeChat, Uber, Amap and etc., bring a lot of convenience to peop...

show abstract

Efficient region of visual interests search for geo-multimedia data

Zhang

Lin

Zhu

et al. 2018

Multimed Tools Appl

View full text Add to dashboard Cite

PAC-GAN: An effective pose augmentation scheme for unsupervised cross-view person re-identification

et al. 2020

View full text Add to dashboard Cite

Person re-identification (person Re-Id) aims to retrieve the pedestrian images of a same person that captured by disjoint and non-overlapping cameras. Lots of researchers recently focuse on this hot issue and propose deep learning based methods to enhance the recognition rate in a supervised or unsupervised manner. However, two limitations that cannot be ignored: firstly, compared with other image retrieval benchmarks, the size of existing person Re-Id datasets are far from meeting the requirement, which cannot provide sufficient pedestrian samples for the training of deep model; secondly, the samples in existing datasets do not have sufficient human motions or postures coverage to provide more priori knowledges for learning. In this paper, we introduce a novel unsupervised pose augmentation cross-view person Re-Id scheme called PAC-GAN to overcome these limitations. We firstly present the formal definition of cross-view pose augmentation and then propose the framework of PAC-GAN that is a novel conditional generative adversarial network (CGAN) based approach to improve the performance of unsupervised corss-view person Re-Id. Specifically, The pose generation model in PAC-GAN called CPG-Net is to generate enough quantity of pose-rich samples from original image and skeleton samples. The pose augmentation dataset is produced by combining the synthesized poserich samples with the original samples, which is fed into the corss-view person Re-Id model named Cross-GAN. Besides, we use weight-sharing strategy in the CPG-Net to improve the quality of new generated samples. To the best of our knowledge, we are the first try to enhance the unsupervised cross-view person Re-Id by pose augmentation, and the results of extensive experiments show that the proposed scheme can combat the state-of-the-arts.

show abstract

A Hybrid Index Model for Efficient Spatio-Temporal Search in HBase

Zhang

Zhu

Long

et al. 2018

View full text Add to dashboard Cite

With advances in geo-positioning technologies and geo-location services, there are a rapidly growing massive amount of spatio-temporal data collected in many applications such as location-aware devices and wireless communication, in which an object is described by its spatial location and its timestamp. Consequently, the study of spatio-temporal search which explores both geo-location information and temporal information of the data has attracted significant concern from research organizations and commercial communities. This work study the problem of spatio-temporal k -nearest neighbors search (STkNNS), which is fundamental in the spatial temporal queries. Based on HBase, a novel index structure is proposed, called Hybrid Spatio-Temporal HBase Index (HSTI for short), which is carefully designed and takes both spatial and temporal information into consideration to effectively reduce the search space. Based on HSTI, an efficient algorithm is developed to deal with spatio-temporal k -nearest neighbors search. Comprehensive experiments on real and synthetic data clearly show that HSTI is three to five times faster than the state-of-the-art technique.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Lei Zhu

CNN-VWII: An efficient approach for large-scale video retrieval by image queries

Hierarchical information quadtree: efficient spatial temporal image search for multimedia stream

HCMSL: Hybrid Cross-modal Similarity Learning for Cross-modal Retrieval

The Application of Deep Learning in Airport Visibility Forecast

Efficient continuous top-k geo-image search on road network

Efficient region of visual interests search for geo-multimedia data

PAC-GAN: An effective pose augmentation scheme for unsupervised cross-view person re-identification

A Hybrid Index Model for Efficient Spatio-Temporal Search in HBase

Contact Info

Product

Resources

About