The role of statistical and semantic features in single-document extractive summarization

Vodolazova, Tatiana; Lloret, Elena; Muñoz, Rafael; Palomar, Manuel

doi:10.5430/air.v2n3p35

Cited by 9 publications

(4 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The last two approaches, [37], used anaphora resolution and a Word Sense Disambiguation (WSD) method for enriching the document with semantic knowledge, and extracting the most important sentences based on the number of concepts they contained, instead of terms. The difference between them was the WSD method employed: MFS for the most frequent sense, and UKB for a PageRank-based WSD method [38].…”

Section: Quantitative Resultsmentioning

confidence: 99%

“…This is the case of COMPENDIUM summarizer [36], which employs textual entailment together with statistical and linguistic-based features for scoring sentences, and determines which ones are more relevant to take part in the summary. Also, the approach proposed in [37] analyzed different combinations of statistical and linguistic settings, such as anaphora resolution together with Word Sense Disambiguation (WSD) methods for extracting the most important sentences based on the number of concepts they contained, instead of terms.…”

Section: Related Workmentioning

confidence: 99%

“…SemPCA-Summarizer (H1) 0.46688 Best DUC 2002 approach 0.42776 Lead baseline DUC 2002 0.41132 wMVC summarizer [34] 0.38800 MUSE [35] 0.45490 COMPENDIUM [36] 0.46008 TS + MFS approach [37] 0.42339 TS + UKB approach [37] 0.42556 Table 3. Comparison with other approaches that used the DUC 2002 corpus…”

Section: Rouge -1 (Recall Value)mentioning

confidence: 99%

See 2 more Smart Citations

SemPCA-Summarizer: Exploiting Semantic Principal Component Analysis for Automatic Summary Generation

Alcón

Lloret

2018

cai

View full text Add to dashboard Cite

Text summarization is the task of condensing a document keeping the relevant information. This task integrated in wider information systems can help users to access key information without having to read everything, allowing for a higher efficiency. In this research work, we have developed and evaluated a singledocument extractive summarization approach, named SemPCA-Summarizer, which reduces the dimension of a document using Principal Component Analysis technique enriched with semantic information. A concept-sentence matrix is built from the textual input document, and then, PCA is used to identify and rank the relevant concepts, which are used for selecting the most important sentences through different heuristics, thus leading to various types of summaries. The results obtained show that the generated summaries are very competitive, both from a quantitative and a qualitative viewpoint, thus indicating that our proposed approach is appropriate for briefly providing key information, and thus helping to cope with a huge amount of information available in a quicker and efficient manner.

show abstract

Section: Quantitative Resultsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

SemPCA-Summarizer: Exploiting Semantic Principal Component Analysis for Automatic Summary Generation

Alcón

Lloret

2018

cai

View full text Add to dashboard Cite

show abstract

“…The current difficulty associated to building natural language generation systems [15] and our considerable experience in text summarisation for extracting key ideas [16], [17], [18] has led us to address this study from a summarisation perspective rather than from natural language generation, even though generating natural language and applying it to Social Media (e.g., Twitter) would be our ultimate long-term goal.…”

Section: Related Workmentioning

confidence: 99%

Analysing and evaluating the task of automatic tweet generation: Knowledge to business

Lloret

Palomar

2016

Computers in Industry

View full text Add to dashboard Cite

In this paper a study concerning the evaluation and analysis of natural language tweets is presented. Based on our experience in text summarisation, we carry out a deep analysis on user's perception through the evaluation of tweets manual and automatically generated from news. Specifically, we consider two key issues of a tweet: its informativeness and its interestingness. Therefore, we analyse: 1) do users equally perceive manual and automatic tweets?; 2) what linguistic features a good tweet may have to be interesting, as well as informative? The main challenge of this proposal is the analysis of tweets to help companies in their positioning and reputation on the Web. Our results show that: 1) automatically informative and interesting natural language tweets can be generated as a result of summarisation approaches; and 2) we can characterise good and bad tweets based on specific linguistic features not present in other types of tweets.Keywords: Natural Language Processing, Text Summarisation, Natural Language Tweet Generation, User Study, Linguistic Analysis, Descriptive Statistics Introduction, Context and MotivationIn the current digital knowledge society, the overload of information has become a problem to companies, which cannot cope with all the available information. As a consequence, companies may not be exploiting the Web, and taking advantage of it accordingly, thus affecting key aspects, such as their visibility, reputation, marketing campaigns, customer's feedback, etc. With the Preprint submitted to Computers in IndustryOctober 11, 2015 This is a previous version of the article published in Computers in Industry. 2016Industry. , 78: 3-15. doi:10.1016Industry. /j.compind.2015 birth of the Web 2.0, there has been a shift in the way the information is produced and consumed by users and companies. The Web 2.0 has established a wide range of on-line mechanisms and platforms through which companies can obtain direct feedback from users. These mechanisms (e.g., reviews, social net- With more than 241 million active users per month 1 , 184 million of which uses Twitter through their mobile device, and more than 500 million tweets daily 2 , Twitter 3 has become an excellent social media for on-line real-time news attention 4 . The length restriction imposed on tweets (140 characters) force messages to be concise, though it is also possible to link out to external information to enrich the tweet. Moreover, hashtags (e.g. #UA Universidad) allow to categorise information, to identify the trending topics, and more importantly to enable a rapid on-line information flow. According to [3]

show abstract

A Comprehensive Method for Text Summarization Based on Latent Semantic Analysis

Wang

2013

Communications in Computer and Information Science

View full text Add to dashboard Cite

The role of statistical and semantic features in single-document extractive summarization

Cited by 9 publications

References 14 publications

SemPCA-Summarizer: Exploiting Semantic Principal Component Analysis for Automatic Summary Generation

SemPCA-Summarizer: Exploiting Semantic Principal Component Analysis for Automatic Summary Generation

Analysing and evaluating the task of automatic tweet generation: Knowledge to business

A Comprehensive Method for Text Summarization Based on Latent Semantic Analysis

Contact Info

Product

Resources

About