Ngoc-Dung T. Tieu scite author profile

et al. 2017

Computer-based automatically generated text are used in various applications (e.g. text summarization, machine translation) and such the machine-generated text significantly helps our social life. However, machine-generated text may produce confusing information sometimes due to errors or inappropriate use of wordings caused by language processing, which could be a critical issue in president elections or in product advertisements. Previous methods for detecting such machinegenerated text typically estimates the text fluency, but, this may not be useful in near future because recently proposed neuralnetwork based natural language generation results in improved wording close to human-crafted one. However, we hypothesize that the habit of human on writing is still more consistent. For instance, the Zipf's law states that the most frequent word in the text written by human approximates twice the second most frequent word, nearly three times the third most frequent word, and so forth. We found that this is not true in the case of machine-generated text. We hence propose a method to identify the machine-generated text based on such the statistics-First, word distributed frequencies are compared with the Zipfian distribution to extract frequency features. Second, complex phrase features are extracted to show that humangenerated text contains more sophisticated phrases than machinegenerated one. Finally, the higher consistency of the humangenerated text is quantified at both the sentence level using phrasal verbs and at the paragraph level based on coreference resolution relationships, which are integrated into consistency features. The combination of the frequency, the complex phrase, and the consistency features is evaluated on a hundred of original English books and a hundred of translated ones from Finnish. The result shows that our method achieves the better performance (accuracy = 98.0% and equal error rate = 2.9%) comparing with a state-of-the-art method using parsing tree feature extraction. An advantage of this method is that this method can be used for large collections of text such as books efficiently. Other evaluation results in two other languages including French and Dutch showed similar results. They demonstrated that the proposed method works consistently in various languages.

An approach for gait anonymization using deep learning

Nguyen-Son

et al. 2017

The human gait has become another biometric trait used in security systems because it is unique to each person and can be recognized at a distance. However, a bad actor could use a gait recognition system to identify a person on the basis of his or her gait. We have developed a gait anonymization method that prevents unauthorized gait recognition. It modifies the gait so that the person cannot be identified while maintaining the naturalness of the gait. The modification is done by adding another gait, called "noise gait". A convolutional neural network makes this modification by taking two gaits as input, the original gait and the noise gait, and outputting an anonymized gait. The proposed method was evaluated using the success rate and mean opinion score (MOS). The success rate is the rate of failed gait recognition, and the MOS is a measure of the naturalness of the anonymized gait. In our experiments, the success rate achieved 98.86% at most while the highest naturalness score is 3.73 in the MOS scale. These findings should open new research directions regarding privacy protection related to gait recognition.

Identifying Computer-Translated Paragraphs using Coherence Features

Nguyen-Son¹,

Tieu²,

Nguyen³

et al. 2018

Preprint

We have developed a method for extracting the coherence features from a paragraph by matching similar words in its sentences. We conducted an experiment with a parallel German corpus containing 2000 human-created and 2000 machine-translated paragraphs. The result showed that our method achieved the best performance (accuracy = 72.3%, equal error rate = 29.8%) when it is compared with previous methods on various computergenerated text including translation and paper generation (best accuracy = 67.9%, equal error rate = 32.0%). Experiments on Dutch, another rich resource language, and a low resource one (Japanese) attained similar performances. It demonstrated the efficiency of the coherence features at distinguishing computer-translated from human-created paragraphs on diverse languages.

Transformation on Computer-Generated Facial Image to Avoid Detection by Spoofing Detector

Nguyen-Son

et al. 2018

Making computer-generated (CG) images more difficult to detect is an interesting problem in computer graphics and security. While most approaches focus on the image rendering phase, this paper presents a method based on increasing the naturalness of CG facial images from the perspective of spoofing detectors. The proposed method is implemented using a convolutional neural network (CNN) comprising two autoencoders and a transformer and is trained using a black-box discriminator without gradient information. Over 50% of the transformed CG images were not detected by three state-of-the-art spoofing detectors. This capability raises an alarm regarding the reliability of facial authentication systems, which are becoming widely used in daily life.

An RGB Gait Anonymization Model for Low-Quality Silhouettes

Fang

et al. 2019