Xuan-Son Vu scite author profile

Given many recent advanced embedding models, selecting pre-trained word embedding (a.k.a., word representation) models best fit for a specific downstream task is non-trivial. In this paper, we propose a systematic approach, called ETNLP, for extracting, evaluating, and visualizing multiple sets of pre-trained word embeddings to determine which embeddings should be used in a downstream task.We demonstrate the effectiveness of the proposed approach on our pre-trained word embedding models in Vietnamese to select which models are suitable for a named entity recognition (NER) task. Specifically, we create a large Vietnamese word analogy list to evaluate and select the pre-trained embedding models for the task. We then utilize the selected embeddings for the NER task and achieve the new state-of-the-art results on the task benchmark dataset. We also apply the approach to another downstream task of privacy-guaranteed embedding selection, and show that it helps users quickly select the most suitable embeddings. In addition, we create an open-source system using the proposed systematic approach to facilitate similar studies on other NLP tasks. The source code and data are available at https: //github.com/vietnlp/etnlp.

show abstract

On multi-resident activity recognition in ambient smart-homes

Tran

Nguyễn

Son

et al. 2019

Artif Intell Rev

View full text Add to dashboard Cite

Increasing attention to the research on activity monitoring in smart homes has motivated the employment of ambient intelligence to reduce the deployment cost and solve the privacy issue. Several approaches have been proposed for multi-resident activity recognition, however, there still lacks a comprehensive benchmark for future research and practical selection of models. In this paper we study different methods for multi-resident activity recognition and evaluate them on same sets of data. The experimental results show that recurrent neural network with gated recurrent units is better than other models and also considerably efficient, and that using combined activities as single labels is more effective than represent them as separate labels.

show abstract

Privacy-Preserving Visual Content Tagging using Graph Transformer Networks

Edlund³

et al. 2020

View full text Add to dashboard Cite

Modular Graph Transformer Networks for Multi-Label Image Classification

Nguyễn

2021

AAAI

View full text Add to dashboard Cite

With the recent advances in graph neural networks, there is a rising number of studies on graph-based multi-label classification with the consideration of object dependencies within visual data. Nevertheless, graph representations can become indistinguishable due to the complex nature of label relationships. We propose a multi-label image classification framework based on graph transformer networks to fully exploit inter-label interactions. The paper presents a modular learning scheme to enhance the classification performance by segregating the computational graph into multiple sub-graphs based on modularity. The proposed approach, named Modular Graph Transformer Networks (MGTN), is capable of employing multiple backbones for better information propagation over different sub-graphs guided by graph transformers and convolutions. We validate our framework on MS-COCO and Fashion550K datasets to demonstrate improvements for multi-label image classification. The source code is available at https://github.com/ReML-AI/MGTN.

show abstract

Personality-based Knowledge Extraction for Privacy-preserving Data Analysis

Jiang

Brändström

et al. 2017

View full text Add to dashboard Cite

Graph-based Interactive Data Federation System for Heterogeneous Data Retrieval and Analytics

Ait-Mlouk

Elmroth

et al. 2019

View full text Add to dashboard Cite

Given the increasing number of heterogeneous data stored in relational databases, file systems or cloud environment, it needs to be easily accessed and semantically connected for further data analytic. The potential of data federation is largely untapped, this paper presents an interactive data federation system (https://vimeo.com/ 319473546) by applying large-scale techniques including heterogeneous data federation, natural language processing, association rules and semantic web to perform data retrieval and analytics on social network data. The system first creates a Virtual Database (VDB) to virtually integrate data from multiple data sources. Next, a RDF generator is built to unify data, together with SPARQL queries, to support semantic data search over the processed text data by natural language processing (NLP). Association rule analysis is used to discover the patterns and recognize the most important co-occurrences of variables from multiple data sources. The system demonstrates how it facilitates interactive data analytic towards different application scenarios (e.g., sentiment analysis, privacyconcern analysis, community detection).

show abstract

VLSP 2021 - VieCap4H Challenge: Automatic Image Caption Generation for Healthcare Domain in Vietnamese

Dang

Nguyen

et al. 2022

JCSCE

View full text Add to dashboard Cite

This paper presents VieCap4H, a grand data challenge on automatic image caption generation for the healthcare domain in Vietnamese. VieCap4H is held as part of the eighth annual workshop on VietnameseLanguage and Speech Processing (VLSP 2021). The task is considered as an image captioning task. Given a static image, mostly about healthcare-related scenarios, participants are asked to design machine learning methods to generate natural language captions in Vietnamese to describe the visual content of the image. We introduce VieCap4H, a novel human-annotated image captioning dataset in Vietnamese that contains over 10,000 image-caption pairs collected from real-world scenarios in the healthcare domain. All the models proposed by the challenge participants are evaluated using BLEU scores against groundtruths. The challenge was run on AIHUB.VN platform. Within less than two months, the challenge has attracted over 90 individual participants and recorded more than 900 valid submissions.

show abstract

WINFRA: A Web-Based Platform for Semantic Data Retrieval and Data Analytics

Ait-Mlouk

Jiang

2020

Mathematics

View full text Add to dashboard Cite

Given the huge amount of heterogeneous data stored in different locations, it needs to be federated and semantically interconnected for further use. This paper introduces WINFRA, a comprehensive open-access platform for semantic web data and advanced analytics based on natural language processing (NLP) and data mining techniques (e.g., association rules, clustering, classification based on associations). The system is designed to facilitate federated data analysis, knowledge discovery, information retrieval, and new techniques to deal with semantic web and knowledge graph representation. The processing step integrates data from multiple sources virtually by creating virtual databases. Afterwards, the developed RDF Generator is built to generate RDF files for different data sources, together with SPARQL queries, to support semantic data search and knowledge graph representation. Furthermore, some application cases are provided to demonstrate how it facilitates advanced data analytics over semantic data and showcase our proposed approach toward semantic association rules.

show abstract

12 3 4

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Xuan-Son Vu

ETNLP: A Visual-Aided Systematic Approach to Select Pre-Trained Embeddings for a Downstream Task

On multi-resident activity recognition in ambient smart-homes

Privacy-Preserving Visual Content Tagging using Graph Transformer Networks

Modular Graph Transformer Networks for Multi-Label Image Classification

Personality-based Knowledge Extraction for Privacy-preserving Data Analysis

Graph-based Interactive Data Federation System for Heterogeneous Data Retrieval and Analytics

VLSP 2021 - VieCap4H Challenge: Automatic Image Caption Generation for Healthcare Domain in Vietnamese

WINFRA: A Web-Based Platform for Semantic Data Retrieval and Data Analytics

Contact Info

Product

Resources

About