Data Augmentation in Natural Language Processing: A Novel Text Generation Approach for Long and Short Text Classifiers

Bayer, Markus; Kaufhold, Marc-André; Buchhold, Björn; Keller, Marcel; Dallmeyer, Jörg; Reuter, Christian

doi:10.48550/arxiv.2103.14453

Cited by 1 publication

(1 citation statement)

References 25 publications

(62 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, the shortage of maintenance text data may hinder the exploitation of this approach. Therefore, a NLP augmentation strategy could be helpful (Bayer, M., Kaufhold, M.-A., Buchhold, B., Keller, M., Dallmeyer, J., and Reuter, C., 2021), although the larger the data analyzed, the greater the chance that spurious correlations dominate the results and lead to erroneous conclusions (Dima, A., Lukens, S., Hodkiewicz, M., Sexton, T., and Brundage, M. P., 2021). Alternatively, fine-tuning a bigger pre-trained language model, which has become the de facto standard for doing transfer learning in NLP, could also be advantageous (Li, J., Tang, T., Zhao, W. X., and Wen, J.-R., 2021).…”

Section: Discussionmentioning

confidence: 99%

Towards Learning Causal Representations of Technical Word Embeddings for Smart Troubleshooting

Trilla

Mijatović²,

Vilasís-Cardona³

2022

IJPHM

View full text Add to dashboard Cite

This work explores how the causality inference paradigm may be applied to troubleshoot the root causes of failures through language processing and Deep Learning. To do so, the causality hierarchy has been taken for reference: associative, interventional, and retrospective levels of causality have thus been researched within textual data in the form of a failure analysis ontology and a set of written records on Return On Experience. A novel approach to extracting linguistic knowledge has been devised through the joint embedding of two contextualized Bag-Of-Words models, which defines both a probabilistic framework and a distributed representation of the underlying causal semantics. This method has been applied to the maintenance of rolling stock bogies, and the results indicate that the inference of causality has been partially attained with the currently available technical documentation (consensus over 70%). However, there is still some disagreement between root causes and problems that leads to confusion and uncertainty. In consequence, the proposed approach may be used as a strategy to detect lexical imprecision, make writing recommendations in the form of standard reporting guidelines, and ultimately help produce clearer diagnosis materials to increase the safety of the railway service.

show abstract

Section: Discussionmentioning

confidence: 99%