Much existing research on blogs focused on posts only, ignoring their comments. Our user study conducted on summarizing blog posts, however, showed that reading comments does change one's understanding about blog posts. In this research, we aim to extract representative sentences from a blog post that best represent the topics discussed among its comments. The proposed solution first derives representative words from comments and then selects sentences containing representative words. The representativeness of words is measured using ReQuT (i.e., Reader, Quotation, and Topic). Evaluated on human labeled sentences, ReQuT together with summation-based sentence selection showed promising results.
In this paper, we aim at detecting events of common user interests from huge volume of user-generated content. The degree of interest from common users in an event is evidenced by a significant surge of event-related queries issued to search for documents (e.g., news articles, blog posts) relevant to the event. Taking the stream of queries from users and the stream of documents as input, our proposed framework seamlessly integrates the two streams into a single stream of query profiles. A query profile is a set of documents matching a query at a given time. With the single stream of query profiles, the well-studied techniques in event detection (e.g., incremental clustering) could be easily applied. In our experiments using real data collected from Blog and News search engines respectively, the proposed technique achieved very high event detection accuracy.
Comments left by readers on Web documents contain valuable information that can be utilized in different information retrieval tasks including document search, visualization, and summarization. In this paper, we study the problem of comments-oriented document summarization and aim to summarize a Web document (e.g., a blog post) by considering not only its content, but also the comments left by its readers. We identify three relations (namely, topic, quotation, and mention) by which comments can be linked to one another, and model the relations in three graphs. The importance of each comment is then scored by: (i) graph-based method, where the three graphs are merged into a multirelation graph; (ii) tensor-based method, where the three graphs are used to construct a 3rd-order tensor. To generate a comments-oriented summary, we extract sentences from the given Web document using either feature-biased approach or uniform-document approach. The former scores sentences to bias keywords derived from comments; while the latter scores sentences uniformly with comments. In our experiments using a set of blog posts with manually labeled sentences, our proposed summarization methods utilizing comments showed significant improvement over those not using comments. The methods using feature-biased sentence extraction approach were observed to outperform that using uniform-document approach.
Blog/news search engines are very important channels to reach information about the real-time happenings. In this paper, we study the popular queries collected over one year period and compare their search results returned by a blog search engine (i.e., Technorati) and a news search engine (i.e., Google News). We observed that the numbers of hits returned by the two search engines for the same set of queries were highly correlated, suggesting that blogs often provide commentary to current events reported in news. As many popular queries are related to some events, we further observed a high cohesiveness among the returned search results for these queries.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.