TFIDF-FL: Localizing Faults Using Term Frequency-Inverse Document Frequency and Deep Learning

Zhang, Zhuo; Lei, Yan; Xu, Jianjun; Mao, Xiaoguang; Chang, Xiuli

doi:10.1587/transinf.2018edl8237

Cited by 5 publications

(3 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

Section: Resultsmentioning

confidence: 99%

“…The algorithm is easily affected by the skew of the data set, such as a large number of documents in a certain category, which leads to the underestimation of IDF. IDF improvement algorithms such as TFIDF-FL (Zhang et al, 2019) have been proposed, and some scholars have also suggested combining TF-IDF with Word2Vec to solve the shortcomings of TF-IDF (Naeem et al, 2022); in short, While the features of the SimHash algorithm are as mentioned above, its text similarity calculation is suitable for low-precision and high-speed scenarios. This calculation has lower requirements for speed but higher requirements for accuracy, which proves that SimHash is unsuitable for studying long texts or for high-precision similarity calculations.…”

Section: Analysis Of Calculationmentioning

confidence: 99%

See 1 more Smart Citation

A semantic similarity analysis of multiple English translations of The Analects: Based on a natural language processing algorithm

Yang

Gui-jun²

2022

Front. Psychol.

View full text Add to dashboard Cite

Working from the readers’ perspective, this study first investigates the online acceptance of the complete English translations of The Analects by investigating the number of online comments, downloads, academic citations, and other factors, and it ranks the different English versions according to how well they are received. The complete English translations of The Analects by D. C. Lau, James Legge, and 15 other translators are found to be well received by readers on mainstream online platforms. Then, based on five natural language processing (NLP) algorithms (TF-IDF, Word2Vec, GloVe, BERT, and SimHash), the 15 well-received English versions of The Analects are taken as samples to calculate semantic similarity. By comparing the semantic differences among the texts, this study analyzes the factors that affect the diversification of translated texts. (1) The influence of Chinese annotation on the translation semantics is great, even the greatest among many influential factors; and (2) different translators’ identities, the translation era, the translation purpose, and the translation background do not significantly affect the semantic influence of the translation. On the one hand, the readers can understand the differences between the different translations and choose an appropriate translation for their reading and learning more effectively. On the other hand, using the algorithms of NLP, we focus on the semantic similarity of different English translations of The Analects and analyze them to show the semantic differences quantitatively, which makes the comparison more intuitive and efficiently. Such a quantitative presentation of the results draws scholars’ attention to the differences in the translations.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Analysis Of Calculationmentioning

confidence: 99%

A semantic similarity analysis of multiple English translations of The Analects: Based on a natural language processing algorithm

Yang

Gui-jun²

2022

Front. Psychol.

View full text Add to dashboard Cite

show abstract

“…TF-IDF computes the weight or value of each word (token) in a corpus document. This method is frequently utilized in information retrieval and text creation to evaluate the relationship of each word in a relationship document [21]. This normalization process determines the weight of terms that appear frequently in a document.…”

Section: Feature Extractionmentioning

confidence: 99%

Topic Detection on Twitter using GloVe with Convolutional Neural Network and Gated Recurrent Unit

Ikfini M,

Setiawan

2023

bits

View full text Add to dashboard Cite

Twitter is a social media platform that allows users to share thoughts or information with others for all to see. However, twitters often use abbreviations, slang, and incorrect grammar because tweets are limited to 280 characters. Topic detection often has problems with low accuracy, one method that can be used to overcome this problem is feature expansion. Feature expansion on Twitter is a semantic addition to the process of expanding the original text syllables to make it look like a large Document. That way, feature expansion is used to reduce word mismatches. This study uses the expansion of the GloVe feature with the Convolutional Neural Network (CNN) and Gated Recurrent Unit (GRU) classification methods. The results show that the topic detection system with the GloVe feature extension and CNN-GRU hybrid classification has an accuracy of 94.41%

show abstract

A personalised operation and maintenance approach for complex products based on equipment portrait of product-service system

Ren¹,

Shi²,

Liu³

et al. 2023

Robotics and Computer-Integrated Manufacturing

View full text Add to dashboard Cite

TFIDF-FL: Localizing Faults Using Term Frequency-Inverse Document Frequency and Deep Learning

Cited by 5 publications

References 14 publications

A semantic similarity analysis of multiple English translations of The Analects: Based on a natural language processing algorithm

A semantic similarity analysis of multiple English translations of The Analects: Based on a natural language processing algorithm

Topic Detection on Twitter using GloVe with Convolutional Neural Network and Gated Recurrent Unit

A personalised operation and maintenance approach for complex products based on equipment portrait of product-service system

Contact Info

Product

Resources

About