Semi-Supervised Low-Resource Style Transfer of Indonesian Informal to Formal Language with Iterative Forward-Translation

Wibowo, Haryo Akbarianto; Prawiro, Tatag Aziz; Ihsan, Muhammad; Aji, Alham Fikri; Prasojo, Radityo Eko; Mahendra, Rahmad; Fitriany, Suci

doi:10.1109/ialp51396.2020.9310459

Cited by 7 publications

(4 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…al. [32] that utilizes GPT-2 to normalise Indonesian text. Both of these research showed that on low resource settings, SMT model still gives on par performance if not better than the NMT model because of the insufficient amount of training data.…”

Section: Discussion and Future Workmentioning

confidence: 99%

“…The MT approach to lexical normalisation works by translating text in informal language to formal language. This approach has been used to normalise text in various languages, such as English [31] Dutch [28], and Indonesian [32]. The MT model used in this research is SMT with a phrase translation unit, also known as phrase-based statistical MT (PBMT).…”

Section: A Code-mixed Normalisationmentioning

confidence: 99%

See 1 more Smart Citation

Normalisation of Indonesian-English Code-Mixed Text and its Effect on Emotion Classification

Yulianti¹,

Kurnia²,

Adriani³

et al. 2021

IJACSA

View full text Add to dashboard Cite

Usage of code-mixed text has increased in recent years among Indonesian internet users, who often mix Indonesian-language with English-language text. Normalisation of this code-mixed text into Indonesian needs to be performed to capture the meaning of English parts of the text and process them effectively. We improve a state-of-the-art code-mixed Indonesian-English normalisation system by modifying its pipeline modules. We further analyse the effect of code-mixed normalisation on emotion classification tasks. Our approach significantly improved on a state-of-the-art Indonesian-English code-mixed text normalisation system in both the individual pipeline modules and the overall system. The new feature set in the language identification module showed an improvement of 4.26% in terms of F 1 score. The combination of machine translation and ruleset in the lexical normalisation module improved BLEU score by 25.22% and lowered WER by 62.49%. The use of context in the translation module improved BLEU score by 2.5% and lowered WER by 8.84%. The effectiveness of the overall pipeline normalisation system increased by 32.11% and 33.82%, in terms of BLEU score and WER, respectively. Code-mixed normalisation also improved the accuracy of emotion classification by up to 37.74% in terms of F 1 score.

show abstract

Section: Discussion and Future Workmentioning

confidence: 99%

Section: A Code-mixed Normalisationmentioning

confidence: 99%

Normalisation of Indonesian-English Code-Mixed Text and its Effect on Emotion Classification

Yulianti¹,

Kurnia²,

Adriani³

et al. 2021

IJACSA

View full text Add to dashboard Cite

show abstract

“…There are limited numbers of TST research works in Bahasa Indonesia. One was exploring formality style transfer using iterative forward translation [11]. Several approaches were implemented, including dictionary-based translation, phrasebased statistical (PBSMT) machine translation, neural machine translation and pretrained language modelling.…”

Section: Introductionmentioning

confidence: 99%

Unsupervised Text Style Transfer for Authorship Obfuscation in Bahasa Indonesia

Sari¹,

Faridzi²

2023

Indonesian J. Comput. Cybern. Syst.

View full text Add to dashboard Cite

Authorship attribution is an NLP task to identify the author of a text based on stylometric analysis. On the other hand, authorship obfuscation aims to protect against authorship attribution by modifying a text’s style. The main challenge in authorship obfuscation is how to keep the content of the text despite the text modification. In this research, we are applying text style transfer methods for modifying the writing style while preserving the content of the input text. We implemented two unsupervised text style transfer: dictionary-based and back translation methods to change the formality level of the text. Experiment results shows that the back-translation method outperformed the dictionary-based method. The authorship attribution performance decreased up to 16.15% and 23.66% on F1-score for 3 and 10 authors respectively using back-translation. While for dictionary-based method the F1-score dropped up to 1.99% and 11.56% for 3 and 10 authors respectively. Evaluation on sensibleness and soundness factors show that the back-translation method can preserve the semantic of the obfuscated texts. Moreover, the modified texts are well-formed and inconspicuous.

show abstract

“…Even though its nature is an encoder, BERT can be used as a decoder because BERT has a Sentence Prediction training concept to generate text [7], [12]. GPT2 is an autoregressivebased model used for sentence construction commonly used as a decoder [13]. Meanwhile, an extractive approach can also be carried out using BERT which is called BERT Extractive, but it is still vulnerable to understanding context because the model is trained more on news data than review data [14].…”

Section: Introductionmentioning

confidence: 99%

Abstractive and Extractive Approaches for Summarizing Multi-document Travel Reviews

Ranggianto,

Purwitasari,

Fatichah

et al. 2023

J. RESTI (Rekayasa Sist. Teknol. Inf.)

View full text Add to dashboard Cite

Travel reviews offer insights into users' experiences at places they have visited, including hotels, restaurants, and tourist attractions. Reviews are a type of multidocument, where one place has several reviews from different users. Automatic summarization can help users get the main information in multi-document. Automatic summarization consists of abstractive and extractive approaches. The abstractive approach has the advantage of producing coherent and concise sentences, while the extractive approach has the advantage of producing an informative summary. However, there are weaknesses in the abstractive approach, which results in inaccurate and less information. On the other hand, the extractive approach produces longer sentences compared to the abstractive approach. Based on the characteristics of both approaches, we combine abstractive and extractive methods to produce a more concise and informative summary than can be achieved using either approach alone. To assess the effectiveness of abstractive and extractive, we use ROUGE based on lexical overlaps and BERTScore based on contextual embeddings which it be compared with a partial approach (abstractive only or extractive only). The experimental results demonstrate that the combination of abstractive and extractive approaches, namely BERT-EXT, leads to improved performance. The ROUGE-1 (unigram), ROUGE-2 (bigram), ROUGE-L (longest subsequence), and BERTScore values are 29.48%, 5.76%, 33.59%, and 54.38%, respectively. Combining abstractive and extractive approach yields higher performance than the partial approach.

show abstract

Semi-Supervised Low-Resource Style Transfer of Indonesian Informal to Formal Language with Iterative Forward-Translation

Cited by 7 publications

References 11 publications

Normalisation of Indonesian-English Code-Mixed Text and its Effect on Emotion Classification

Normalisation of Indonesian-English Code-Mixed Text and its Effect on Emotion Classification

Unsupervised Text Style Transfer for Authorship Obfuscation in Bahasa Indonesia

Abstractive and Extractive Approaches for Summarizing Multi-document Travel Reviews

Contact Info

Product

Resources

About