Term frequency – function of document frequency: a new term weighting scheme for enterprise information retrieval

Zhang, Hui; Wang, Deqing; Wu, Wenjun; Hu, Hongping

doi:10.1080/17517575.2012.665945

Cited by 13 publications

(5 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For the calculation method of word frequency features, researchers mainly use the classical TF-IDF (word frequency-inverse document frequency) algorithm [19,24,27], because TF-IDF will give higher weights to some common words, which will lead to lower weights of some burst words, research scholars have improved the TF-IDF algorithm [36,37], and some other researchers have proposed the DF-IDF (document frequencyinverse document frequency) algorithm [28] to make up for the defects of TF-IDF. The word frequency calculation method mentioned above is applied only to a single data source, and for the problem of multiple data sources, Bun et al proposed a novel TF*PDF (Term Frequency * Proportional Document Frequency) algorithm [39,40]. The algorithm describes that whenever a popular topic is being discussed, that topic is frequently discussed in numerous news documents from most news sources.…”

Section: Word Frequency Featuresmentioning

confidence: 99%

A Novel Burst Event Detection Model Based on Cross Social Media Influence

Ban

Zhang

Dong-jun

et al. 2022

Preprint

View full text Add to dashboard Cite

With the frequent occurrence of public emergencies around the world today, how to effectively use big data and artificial intelligence technologies to accurately and efficiently detect and identify burst events of the Internet has become a hot issue. These existing burst event detection methods lack of comprehensively considering multi-data source of social media and their influences, which leads to a lower accuracy. This paper proposes a novel burst event detection model based on cross social media influence and unsupervised clustering. In this article, we, explain the basic framework of burst event detection, along with characteristics of social media influence, and the word frequency features and growth rate features. In our proposed approach, according to the time information in the data stream, social media network data were sliced and the burst word features in each time window were calculated. Then, the three burst features were fused to compute the burst degree of words; after that the words larger than the threshold were selected to form the burst word set. Finally, the agglomerative hierarchical clustering method is introduced to cluster the burst word set and extracts the burst event from it. The results of the experiment on a real-world social media dataset show that the detection method has significantly improved in Precision and F1-score value compared with the latest four burst event detection methods and prove the effectiveness of the proposed method.

show abstract

Section: Word Frequency Featuresmentioning

confidence: 99%

A Novel Burst Event Detection Model Based on Cross Social Media Influence

Ban

Zhang

Dong-jun

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…A framework for data mining of key safety factors from safety-related cases of consumer products is proposed and depicted in Figure 1. Data mining theory, methods and algorithms have been discussed by many authors in the literature (Wilamowski et al, 1999;Wilamowski and Kaynak, 2000;Li and Xu, 2001;Li et al, 2003Li et al, , 2009Li et al, , 2013aLi et al, , 2013bDuan et al, 2007Duan et al, ,2009Shi et al, 2007;Hewlett et al, 2008;Xu et al, 2008;Wilamowski, 2010;Duan and Xu, 2012;Fritzsche et al, 2012;Hunter et al, 2012;Ingvaldsen and Gulla, 2012;Xu et al, 2012;Yang et al, 2012;Yu et al, 2012;Zhang et al, 2012;Bulysheva and Bulyshev, 2013;Katayev et al, 2013;Wang et al, 2013b;Xia et al, 2013;Xing et al, 2013;Zeng et al, 2013aZeng et al, , 2013b. The functional modules in the framework are classified into three phases, that is, cases collection, extraction of impact factors and retrieving of key factors by knowledge reasoning based on the Bayesian network.…”

Section: A Data Mining Frameworkmentioning

confidence: 99%

A Knowledge Engineering Framework for Identifying Key Impact Factors from Safety‐Related Accident Cases

Pan

Wang

et al. 2014

Syst. Res.

View full text Add to dashboard Cite

Consumer product safety closely relates to consumer health. In this paper, a knowledge engineering framework is proposed for data mining to identify key safety factors from a large number of consumer product safety cases. Data mining in the framework is performed in three steps. The first step is to collect consumer product safety cases, a case can be semistructured or unstructured, and cases can be collected either manually or automatically by a web spider crawling certain websites. The second step is to extract all safety factors from a number of consumer product safety cases. A new method based on linear chain conditional random field is developed to extract safety factors. The effectiveness of the method has been validated on product cases. The third step is to identify a set of key factors from all safety factors by knowledge reasoning. To illustrate the process of knowledge reasoning, a set of 3192 safety cases of electric products with electric shock accidents is chosen as the case study; a Bayesian network based model is developed to retrieve key safety factors relating to electric shock accidents. The performance of the reasoning model has been verified by a combination of experts' evaluation and experiments, and it has shown the proposed reasoning model can help identify key safety factors of electric shock accidents successfully. Overall, the proposed framework is capable of identifying key safety factors from a large number of consumer product safety cases. Copyright © 2014 John Wiley & Sons, Ltd.

show abstract

“…In an effort to solve this issue, we have used the latest eye‐tracking technology in our study to research how different forms of ads affect the attention of streaming media advertisement. The eye‐tracking technology has been frequently used in information studies (Burns and Lutz, ; Moore et al ., ; Beynon‐Davies, ; Yang et al ., ; Zhang et al ., ).…”

Section: Introductionmentioning

confidence: 97%

Streaming Media Advertising: An Empirical Study

et al. 2013

View full text Add to dashboard Cite

This study utilized the eye-tracking technology to investigate consumers' behavioral responses in three different streaming media advertising forms. Thirty-two undergraduates and postgraduates participated in this study, and their eye-movement data were collected as they viewed four different types of streaming media advertisements on Web pages coded in Chinese. Considering audiences' online status, both browsing scenario and information search scenario are designed. Through analysing the impact of advertisement forms on audience by using the two-way analysis of variance, the results show that (i) audiences are more sensitive to streaming media advertising when they are in the information search scenario; (ii) ordinary floating layer advertising and Tear Page Advertising capture more attention than iTouch and hurdles advertising; and (iii) the play time do affect audiences' response to streaming media advertising. Detailed discussions on results and suggestions for future studies are provided in this paper.

show abstract

Term frequency – function of document frequency: a new term weighting scheme for enterprise information retrieval

Cited by 13 publications

References 33 publications

A Novel Burst Event Detection Model Based on Cross Social Media Influence

A Novel Burst Event Detection Model Based on Cross Social Media Influence

A Knowledge Engineering Framework for Identifying Key Impact Factors from Safety‐Related Accident Cases

Streaming Media Advertising: An Empirical Study

Contact Info

Product

Resources

About