Create Special Domain News Collections through Summarization and Classification

Teng, Zhi; Liu, Ye; Ren, Fuji

doi:10.1002/tee.20493

Cited by 5 publications

(3 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…After using a purist approach to track stories in RSS feeds focusing on public fears about science, they concluded that, despite useful information in RSS, extensive and repetitive content requires data cleansing. This pragmatic approach has been more widely adopted in recent work clustering and classifying text from RSS feeds, of which [7], [8], [9] and [10] are examples. Roesler [11] has also identified caveats here concerning the number of documents or RSS feeds/items to be clustered, semantic and linguistic issues, and the time taken to cluster content especially in a real-time application.…”

Section: Related Workmentioning

confidence: 99%

visualRSS: A Platform to Mine and Visualise Social Data from RSS Feeds

O'Shea

Levene

2012

Current Trends in Web Engineering

View full text Add to dashboard Cite

Abstract. RSS, a popular method of syndicating frequently updated on-line content, allows data to be stored in a semi-structured, XMLbased format. Much work has been carried out applying data mining techniques to RSS, but in this paper we propose the visualRSS (vRSS) application as a platform to mine and visualise data trends in RSS feeds, by tracking changes in keyword frequencies as a source of social data. Core components of vRSS's architecture to manipulate RSS feeds are described. We also present the results of vRSS's initial experimental usage involving 36 students in late 2011, concerning our research into preferences of mining types and visualisations.

show abstract

Section: Related Workmentioning

confidence: 99%

visualRSS: A Platform to Mine and Visualise Social Data from RSS Feeds

O'Shea

Levene

2012

Current Trends in Web Engineering

View full text Add to dashboard Cite

show abstract

“…On the other hand, the pragmatic approach has been more widely adopted especially in classifying and clustering text from feed contents. Teng et al (2010) used automated techniques to summarise and classify RSS feeds, and applied these to items concerning disasters. Getahun et al (2009) compared the relatedness of stories to merge news items, whilst Liu et al (2009) similarly retrieved news stories from RSS feeds and classified them; therefore news items could be reorganised to allow customisable feeds by end-users.…”

Section: Related Workmentioning

confidence: 99%

Mining and visualising information from RSS feeds: a case study

O'Shea

Levene²

2011

International Journal of Web Information Systems

View full text Add to dashboard Cite

Purpose -Recent years have seen "really simple syndication" or "rich site summary"(RSS) syndication of frequently updated content become ubiquitous across the internet. RSS's XML-based format allows these data to be stored in a semi-structured format but, despite the presence of online aggregators and readers, and the related work in clustering feeds and mining subjects by keywords, much potentially useful information present in RSS may remain undiscovered. This paper aims to address this issue in an experimental setting. Design/methodology/approach -This paper presents two distinct technologies which employ the semi-structured nature of RSS content to allow users to mine information directly from raw RSS feeds: occurrence mining counts occurrences of text strings in feeds, whilst value mining mines structured ticker tape numeric data. It describes both technologies and their implementation in an experiment, where 35 students mined small numbers of RSS feeds and visualised the data mined. Findings -This paper analyses the results of the experiment and cites examples of data mined and visualisations produced. The subject matter of data mined is also explored and potential applications of the technologies are considered.Research limitations/implications -The mining technologies proposed in this paper have been developed to mine textual and numeric data directly from feeds, but can be extended to mine other data types present in RSS and to include other variants like Atom. Originality/value -These technologies are seen to be applicable to data mining, the role of data and visualisations in social data analysis, issue tracking in news mining and time series analysis.

show abstract

“…In their seminal article, "Improving retrieval performance by relevance feedback", Salton and Buckley (1990) (Kaptein & Kamps, 2011;Xu, Luo, Yu, & Xu, 2011;Hamdi, 2011;Li, Otsuka, & Kitsuregawa, 2010;Fu, 2010;Gabrilovich et al, 2009;Nauer & Toussaint, 2009;Yumoto, Mori, & Sumiya, 2009;Kuppusamy & Aghila, 2009), Web commerce (Verma, Tiwari, & Mishra, 2011), Web 2.0 RSS feed content (Teng, Liu, & Ren, 2010), and multilingual IR (He & Wu, 2011;He, Tu, Luo, & Li, 2009;). …”

Section: Commentarymentioning

confidence: 99%