“…The main idea behind TF-IDF [36] is to find words with unique traits, and it can be used to make microtext lines easier to read. MMR considers the similarity between the extracted text and the entire document and between the extracted sentences and the summaries [37,38]. After calculating the similarity of each sentence to the entire text and between two sentences, the algorithm formula is iterated to rank the sentence scores of the microblog texts.…”
Volunteered geographic information (VGI) plays an increasingly crucial role in flash floods. However, topic classification and spatiotemporal analysis are complicated by the various expressions and lengths of social media textual data. This paper conducted applicability analysis on bidirectional encoder representation from transformers (BERT) and four traditional methods, TextRank, term frequency–inverse document frequency (TF-IDF), maximal marginal relevance (MMR), and linear discriminant analysis (LDA), and the results show that for user type, BERT performs best on the Government Affairs Microblog, whereas LDA-BERT performs best on the We Media Microblog. As for text length, TF-IDF-BERT works better for texts with a length of <70 and length >140 words, and LDA-BERT performs best with a text length of 70–140 words. For the spatiotemporal evolution pattern, the study suggests that in a Henan rainstorm, the textual topics follow the general pattern of “situation-tips-rescue”. Moreover, this paper detected the hotspot of “Metro Line 5” related to a Henan rainstorm and discovered that the topical focus of the Henan rainstorm spatially shifts from Zhengzhou, first to Xinxiang, and then to Hebi, showing a remarkable tendency from south to north, which was the same as the report issued by the authorities. We integrated multi-methods to improve the overall topic classification accuracy of Sina microblogs, facilitating the spatiotemporal analysis of flooding.
“…The main idea behind TF-IDF [36] is to find words with unique traits, and it can be used to make microtext lines easier to read. MMR considers the similarity between the extracted text and the entire document and between the extracted sentences and the summaries [37,38]. After calculating the similarity of each sentence to the entire text and between two sentences, the algorithm formula is iterated to rank the sentence scores of the microblog texts.…”
Volunteered geographic information (VGI) plays an increasingly crucial role in flash floods. However, topic classification and spatiotemporal analysis are complicated by the various expressions and lengths of social media textual data. This paper conducted applicability analysis on bidirectional encoder representation from transformers (BERT) and four traditional methods, TextRank, term frequency–inverse document frequency (TF-IDF), maximal marginal relevance (MMR), and linear discriminant analysis (LDA), and the results show that for user type, BERT performs best on the Government Affairs Microblog, whereas LDA-BERT performs best on the We Media Microblog. As for text length, TF-IDF-BERT works better for texts with a length of <70 and length >140 words, and LDA-BERT performs best with a text length of 70–140 words. For the spatiotemporal evolution pattern, the study suggests that in a Henan rainstorm, the textual topics follow the general pattern of “situation-tips-rescue”. Moreover, this paper detected the hotspot of “Metro Line 5” related to a Henan rainstorm and discovered that the topical focus of the Henan rainstorm spatially shifts from Zhengzhou, first to Xinxiang, and then to Hebi, showing a remarkable tendency from south to north, which was the same as the report issued by the authorities. We integrated multi-methods to improve the overall topic classification accuracy of Sina microblogs, facilitating the spatiotemporal analysis of flooding.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.